Application Loading Issue

Incident Report for Close

Postmortem

Close sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. 

Impact

Between 1900 and 2000 UTC on September 12, 2024 the Close app and API were severely degraded due to a partial outage of our back end MongoDB database. The database was restored to normal operation by 2000 UTC without data loss. The Close app and API returned to normal operation by 2000 UTC.

Root Cause and Resolution

At 1900 UTC on September 12, 2024 components of our back end MongoDB database in one of our data centers came under anomalous load and became unresponsive. This resulted in wide spread intermittent disruption to the Close app and API. Once the affected components were identified and restarted performance of the Close app and API returned to normal.

We are in the process of deploying a new architecture for this part of our system that will be more resilient to this class of failure. In the meantime we are deploying additional monitoring that will reduce the amount of time required to identify and mitigate such issues going forward.

Timeline

  • 1900 UTC: The Close app and API begin to experience elevated error rates
  • 1904 UTC: Close Engineering is alerted of the elevated error rate by automatic monitoring
  • 1909 UTC: Close Engineering identifies a network issue affecting one of our data-centers
  • 1945 UTC: Close Engineering identifies our back end MongoDB database as being critically impaired
  • 1954 UTC: Close Engineering begins operating on the affected database to restore normal operation
  • 2001 UTC: Close Engineering completes operations on the affected database
  • 2001 UTC: The Close app and API returns to normal operation
Posted Sep 13, 2024 - 09:06 PDT

Resolved

This incident has been resolved, and Close is fully operational.
Posted Sep 12, 2024 - 14:01 PDT

Monitoring

The issue has been identified, and our team are currently monitoring performance.
Posted Sep 12, 2024 - 13:06 PDT

Investigating

We are currently seeing intermittent issues with Close loading. We will let you know as soon as this is fixed.
Posted Sep 12, 2024 - 12:29 PDT
This incident affected: Application UI.