Degraded Dialer Performance

Incident Report for Close

Postmortem

Close sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring.

Impact

Dialer functionality was impaired for 58 minutes from 15:20 UTC to 16:18 UTC on March 10th 2025. During this time the Dialer feature could get stuck in “connecting” state.

Root Cause and Resolution

The issue was triggered at 15:20 UTC by a service rebalance that caused a number of client connections to close simultaneously. When these clients attempted to reconnect, the sudden spike in traffic that occurred in peak traffic conditions exceeded system limits, leading to service disruptions.

Our team quickly identified the cause and worked to stabilize the system. We restored normal operations by 16:18 UTC.

To prevent similar incidents in the future, we are reviewing system thresholds and improving our ability to handle sudden increases in demand.

Timeline

  • 15:20 UTC - a service rebalance occurs, starting a wave of new connections being established
  • 15:21 UTC - a portion of requests starts getting dropped due to rate limits
  • 15:28 UTC - alerts trigger and our response team began identifying the root cause
  • 15:36 UTC - the rate of dropped requests subsides, but then increases again soon due to a wave-like pattern of retries
  • 16:18 UTC - final wave of increased errors finishes and situation returns to normal operational levels
Posted 11 days ago. Mar 12, 2025 - 10:39 PDT

Resolved

This incident has been resolved.
Posted 13 days ago. Mar 10, 2025 - 09:28 PDT

Monitoring

We are continuing to investigate the cause of the degraded performance of our Dialer system. Our Dialer system is now functioning normally. We are monitoring performance.
Posted 13 days ago. Mar 10, 2025 - 09:21 PDT

Investigating

We've become aware of degraded performance of our Dialer service. We are investigating the issue. Updates will be posted as they become available.
Posted 13 days ago. Mar 10, 2025 - 09:02 PDT
This incident affected: Phone (Dialer).