Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.
The Close App and API was unavailable to all customers for 43 minutes from 17:00 to 17:43 UTC on Wednesday July 22, 2019 due to the failure of a backend database. Severe degradation began at 16:53 UTC. The database was recovered and all services were restored by 17:43 UTC.
One of our backend PostreSQL databases became starved of available memory. This prevented the database from accepting new work, resulting in an interruption of service to the Close system. The issue was resolved by increasing the amount of memory available to the database.
15:57: First alarms begin to fire about delays in Email Sequences and PostgreSQL CPU usage
16:00: Close Infrastructure begins investigation
16:30: Memory pressure identified as the cause of alarms on the affected database
16:53: The affected database failed, causing the Close app and API to become unavailable
16:53: Decision made to scale the database to an instance class with more memory
17:04: The maintenance page is posted in preparation for the database scaling operation
17:04: Scaling operation begins on the affected database
17:32: Scaling operation completes
17:43: Application services are restored
To ensure that events such as this do not occur in the future we are taking the following actions: