Description: IT Visibility - NA - Data Processing was Delayed
Timeframe: January 22nd, 6:23 PM to January 23rd, 4:40 AM PST
Incident Summary
On Sunday, January 22nd, at 6:23 PM PST, we experienced a service interruption with the IT Visibility data streaming service which may have caused a processing backlog in the environment. As a result, some customers may have experienced delays accessing the most recent inventory data from IT Visibility.
During the routine health checks at 1:30 AM PST, technical staff found that one of the pods for the data streaming service was in an unhealthy state, with it failing and restarting repeatedly. After further investigation, staff identified some unknown data sent by one of the orgs to be the cause behind resource contention issues within the pod. This further caused a downstream impact on the data processing for the other orgs utilizing the same pod.
At 2:40 AM PST, technical staff rebalanced the incoming traffic between other pods in the cluster to reduce the load on the impacted pod. In addition, at 4:39 AM PST, technical staff temporarily moved the unknown data in question to an alternate pod, following which the data processing resumed as usual.
At 4:40 AM PST, the backlog processing was completed. After additional monitoring, this incident was declared resolved.
Root Cause
As per the preliminary investigation, technical staff found some unknown data sent by one of the orgs to be the cause behind resource contention issues within the pod. This further caused a downstream impact on the data processing for the other orgs utilizing the same pod.
Corrective Action