Premature Sending Of Sequence Emails

Incident Report for Close

Postmortem

Close sincerely apologizes for the incorrect behavior of our service. We take reliability and correctness of our platform very seriously and the issues you have experienced are not representative of level of service we aim to provide. Below is an explanation of what happened and how we will prevent such mistakes from occurring in the future.

Impact

On 11/24/2020, during a period between 2:48pm UTC and 3:39pm UTC (51 minutes), Sequence Subscriptions that were due to send a single email sent all of the remaining emails with only 5 minutes of a delay between each step (instead of the user-defined delay specified in days).

Root Cause & Resolution

During the period of the incident, the Close application successfully sent Sequence emails, but – due to a code change in how we persist data in our database – failed to update the relevant Sequence records with the date & time when the Sequence should be processed next. In result, during the next processing iteration for a given Subscription (5 minutes later), the next step in the Sequence would be sent prematurely.

The issue was caused by at attempt to fix some of the session management issues which caused the outage on 11/23. The new code has prematurely detached Sequence records from the database session, causing any changes to these records to not be persisted. The date & time when the Sequence should be processed next was among the data that failed to be updated.

Timeline

Nov 24 14:48 UTC – Deployed change that caused data persistence issues.
Nov 24 14:56 UTC – Engineers became aware of the persistence problem (though not its end-user impact) and started reverting the problematic code change.
Nov 24 15:30 UTC – The reverted code has been deployed.
Nov 24 15:46 UTC – We have received first (delayed) reports of this issue causing Sequences to send some steps prematurely. We have investigated these reports and connected it with this incident.

Next Steps

Prioritize refactoring our older code to use the new, safer session management logic.
Add additional warnings and protections with regards to scheduling the next step in a Sequence to our Sequence Scheduling Service.

Posted Nov 27, 2020 - 04:47 PST

Resolved

We are currently investigating this issue.

Posted Nov 24, 2020 - 06:00 PST