Close sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.
Between 16:47 UTC and 17:18 UTC on July 26, 2023 Close users may have experienced degraded performance and an elevated error rate when using the Close application or API.
This incident began when our internal metrics service became degraded at 16:47 UTC on July 26, 2023. Our internal metrics service is used by our compute platform to automatically determine the appropriate amount of resources needed to run the Close application. When the metrics service became degraded our compute platform was unable to determine the correct amount of resources needed to run the Close application. Over the next several minutes our compute platform de-provisioned resources causing the system to become overloaded and perform poorly. When the internal metrics service was restored at 17:18 UTC our compute platform provisioned the correct amount of resources and performance returned to normal.
Our compute platform has a fail safe built in to avoid this exact situation that did not function. We are investigating why this fail safe did not function. We are also deploying improvements to our internal metrics service to make it more stable and alert sooner if it becomes unstable.