Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.
All Close systems were inaccessible for all customers for 35 minutes between 16:12 and 16:47 UTC on February 26 2020.
The primary Close MongoDB database failed in a way that required manual intervention to resolve due to an upstream hardware failure.
The primary of one of our MongoDB replica sets experienced a drive failure of its data drive at 16:12 UTC. The Linux kernel was unable to read from the failed drive. Because the MongoDB API on the affected node was still operational the replica set did not recognize the node had failed and did not automatically perform a leader election. The Close Infrastructure Operations Team manually triggered a leader election, after which the incident was resolved.
To make sure this doesn’t happen again Close is taking the following steps: