Application not loading

Incident Report for Close

Postmortem

Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.

Impact

All Close systems were inaccessible for all customers for 35 minutes between 16:12 and 16:47 UTC on February 26 2020.

Root Cause

The primary Close MongoDB database failed in a way that required manual intervention to resolve due to an upstream hardware failure.

The primary of one of our MongoDB replica sets experienced a drive failure of its data drive at 16:12 UTC. The Linux kernel was unable to read from the failed drive. Because the MongoDB API on the affected node was still operational the replica set did not recognize the node had failed and did not automatically perform a leader election. The Close Infrastructure Operations Team manually triggered a leader election, after which the incident was resolved.

Next Steps

To make sure this doesn’t happen again Close is taking the following steps:

Adding additional monitoring around the type of drive that failed to reduce response time.
Adding additional internal documentation to reduce time spent solutioning.
Launching an investigation with our upstream provider to understand exactly why the drive failed and what can be done to prevent it from happening in the future.

Posted Feb 27, 2020 - 13:35 PST

Resolved

This issue has been resolved and all systems are functioning normally.

Posted Feb 26, 2020 - 09:06 PST

Update

The Close Application and Close API are back up and running and our engineers are currently monitoring our systems for any issues.

Posted Feb 26, 2020 - 08:58 PST

Monitoring

The application is now loading and our engineers are monitoring the situation.

Posted Feb 26, 2020 - 08:52 PST

Identified

The application is now loading and our engineers are continuing to investigate the cause of the issue.

Posted Feb 26, 2020 - 08:48 PST

Investigating

Our engineers are currently investigating issues loading the Close application and pulling data from the Close API.

Posted Feb 26, 2020 - 08:28 PST

This incident affected: Application UI and API.