Description: Flexera One - RightScale Self-Service - Shard 3 - NA - Slow Response/Errors
Timeframe: March 14th, 7:06 PM to March 14th, 7:59 PM PDT
Incident Summary:
On March 14th at 7:06 PM PDT, an outage occurred that affected the RightScale Self-Service in Shard 3, leading to slow response times and server errors for customers using the service portal. This impacted their ability to manage resources, potentially causing delays or disruptions in operations.
The root cause was traced to an unusual failure within a critical component of the system, which was still functioning but at significantly reduced performance levels. This odd state required the component to be replaced.
To address the situation and restore normal service, the faulty component was replaced at 7:59 PM PDT. Following a series of health checks and thorough monitoring, the incident was successfully resolved.
Root Cause:
The root cause of the incident was an unusual failure within a critical system component, which continued to function at significantly reduced performance levels. This failure led to slow response times and server errors, ultimately affecting customers' ability to access and manage resources via the service portal.
Contributing Cause:
The monitoring system's inability to detect and alert the team to the component's partially functional state. This lack of timely detection and intervention allowed the issue to persist, affecting customers' ability to access and manage resources via the service portal.
Corrective Actions: