Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.
Close systems were suffering from degraded performance for 3 hours between 12:20 and 15:25 UTC on December 21, 2020.
The primary Close App suffered from performance issues due to an issue with our backend database starting at 12:20 UTC. Close Engineering identified the issue at 15:07 UTC and had a fix deployed at 15:25 UTC.
Dec 21 12:06 UTC - The first signs of inconsistent query execution occur on our MongoDB database.
Dec 21 12:20 UTC - Alerts begin firing indicating degraded performance
Dec 21 12:32 UTC - Close Engineering identifies the affected database shard and triggers a failover
Dec 21 12:57 UTC - The issue reoccurs after the fail over. Troubleshooting continues.
Dec 21 13:47 UTC - Close Engineering identifies the email sync service as the source of the issue
Dec 21 15:07 UTC - Close Engineering identifies a MongoDB query using an inappropriate index intermittently
Dec 21 15:25 UTC - Close Engineering deploys a fix to production
Dec 21 15:25 UTC - Close systems return to normal performance
To make sure this doesn’t happen again Close is taking the following steps: