High database load causing app performance issues
Incident Report for Close
Postmortem

Close sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. 

Impact

Between 13:40 and 16:55 UTC on Wednesday September 28, 2024 the Close App and API experienced degraded performance. Some users may have noticed the App UI & API responding sluggishly.

Concurrently, between 14:33 and 19:30 UTC background task processing inside of the Close app was disrupted. During this time Workflows and Email sending may not have occurred on schedule.

Root Cause and Resolution

At 13:14 UTC on Wednesday September 28, 2024 Close Engineering deployed an updated version of our browser application. A bug in this new version caused a large increase of impactful requests to be sent to our back end system. By 14:00 UTC the number of additional requests had grown such that our back end database was overloaded causing poor application performance. 

Close Engineering was able to revert the change to our browser application by 14:51 UTC. While waiting for all of our clients’ browsers to update to the fixed version of our app Close Engineering undertook several steps to reduce the load on our overloaded database between 14:30 UTC and 17:00 UTC.

Disruption during this time also degraded our ability to collect runtime metrics on our background task processing system. This caused the background task processing system to think that it was not under load and to scale down. Close Engineering fixed the issue with metrics gathering by 18:20 UTC. At which point background task processing returned to normal operation.

To prevent another incident like the from occurring Close Engineering will audit our growing data stores for opportunities to better distribute load and prevent the database from becoming overloaded. We will also implement a training regimen for our incident responders to ensure more timely and consistent communication during future incidents.

Timeline

  • 13:14 UTC - Close Engineering deploys an updated version of our browser application
  • 13:59 UTC - Close Engineering is alerted to degraded performance of our system
  • 14:30 UTC - Close Engineering identifies our back end database as overloaded
  • 14:30 UTC - Close Engineering begins load shedding operations to preserve system performance
  • 14:33 UTC - Disruption to background task processing begins
  • 14:51 UTC - Close Engineering reverts the change to our browser application
  • 17:31 UTC - Close Engineering begins to undo load shedding to restore normal operation
  • 18:20 UTC - Close Engineering begins manual operations to restore background task processing.
  • 18:50 UTC - The back end database becomes overloaded once more
  • 19:30 UTC - Close Engineering scales up the back end database
  • 19:30 UTC - Background task processing returns to normal. All Close systems are functioning normally
Posted Sep 26, 2024 - 14:00 PDT

Resolved
This incident has been resolved.
Posted Sep 25, 2024 - 12:52 PDT
Update
The Close app is functioning normally. Some background task processing may be delayed. We are continuing to monitor for further issues.
Posted Sep 25, 2024 - 12:12 PDT
Update
We are continuing to monitor for any further issues.
Posted Sep 25, 2024 - 12:11 PDT
Update
The Close app is functioning normally. Some background task processing may be delayed. We are continuing to monitor for further issues.
Posted Sep 25, 2024 - 12:05 PDT
Update
We are continuing to monitor for any further issues.
Posted Sep 25, 2024 - 11:53 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 25, 2024 - 11:31 PDT
Update
The app is recovering and we are continuing to monitor the situation. Some background tasks could still be delayed.
Posted Sep 25, 2024 - 11:28 PDT
Investigating
We are currently investigating this issue
Posted Sep 25, 2024 - 07:40 PDT
This incident affected: Application UI, API, Search (Indexing), and Email (Sending).