One of the servers for one of our three search clusters experienced connectivity issues due to the underlying Amazon Web Services infrastructure. In the process of replacing this failing server, a set of invalid connection properties to the search cluster were distributed to the live application which caused all connections to this particular search cluster to fail. As soon as the error was detected, the connection properties were corrected and we initiated a restart of our application to clear the cache of the bad connection properties.
This was the first time we had replaced one of the servers for this search cluster and we have found that the process used for replacement was insufficient to prevent invalid connection properties from being distributed to the containers running the live Close.io application. We have revised our internal process to include an automatic validation of any modifications to the connection properties before they are implemented in the live application.