Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.
On October 20 from 5:30 AM until 6:40 AM EDT email syncing into the Close application was delayed for some customers. No data was lost during this incident. Email sync functionality was restored by 6:40 AM EDT.
The root cause of this incident was a bug in a vendor’s software introduced during scheduled maintenance on 10/18/2022. The impact of this bug was detected on 10/20 and the Close engineering team rolled the affected component back to a known working version to resolve this incident.
To prevent this from happening in the future the Close engineering team has added additional monitoring to alert the Close engineering team when container workloads are not being scheduled properly.
10/18 10:56 PM EDT: The Close engineering team upgrades components of our container platform during a scheduled maintenance window. Error rates for scheduling containers begin to slowly increase.
10/20 5:30 AM EDT: Close support escalates issues with email syncing to the engineering team.
10/20 6:10 AM EDT: Close engineering identifies an issue with scheduling container workloads.
10/20 6:22 AM EDT: The Close engineering team begins manual intervention to lessen customer impact.
10/20 6:38 AM EDT: Email sync functionality is largely restored.
10/20 6:39 AM EDT: The Close engineering team isolates the issue to the interface between our secret store and containers which consume secrets.
10/20 9:19 AM EDT: The root cause is identified as a bug introduced in the version of our secret store connector during the 10/18 scheduled maintenance.
10/20 9:25 AM EDT: The Close engineering team rolls back the secret store connector to a known working version resolving the incident.