Slow app performance
Incident Report for Close
Postmortem

Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.

Close systems were suffering from degraded performance for 3 hours between 12:20 and 15:25 UTC on December 21, 2020.

Root Cause

The primary Close App suffered from performance issues due to an issue with our backend database starting at 12:20 UTC. Close Engineering identified the issue at 15:07 UTC and had a fix deployed at 15:25 UTC.

Timeline

Dec 21 12:06 UTC - The first signs of inconsistent query execution occur on our MongoDB database.

Dec 21 12:20 UTC - Alerts begin firing indicating degraded performance

Dec 21 12:32 UTC - Close Engineering identifies the affected database shard and triggers a failover

Dec 21 12:57 UTC - The issue reoccurs after the fail over. Troubleshooting continues.

Dec 21 13:47 UTC - Close Engineering identifies the email sync service as the source of the issue

Dec 21 15:07 UTC - Close Engineering identifies a MongoDB query using an inappropriate index intermittently

Dec 21 15:25 UTC - Close Engineering deploys a fix to production

Dec 21 15:25 UTC - Close systems return to normal performance

Next Steps

To make sure this doesn’t happen again Close is taking the following steps:

  • Specifying hints for our highest impact database queries to ensure consistent performance
  • Implement additional monitoring to detect inconsistent query execution
  • Implement additional monitoring to more aggressively alert around slow queries
Posted Dec 21, 2020 - 11:30 PST

Resolved
This incident has been resolved.
Posted Dec 21, 2020 - 07:44 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Dec 21, 2020 - 07:36 PST
Update
We have isolated the issue to email syncing and are continuing to investigate.
Posted Dec 21, 2020 - 06:28 PST
Investigating
Some customers may be experiencing slow app performance. We are investigating a database issue.
Posted Dec 21, 2020 - 05:17 PST
This incident affected: Email (Syncing).