Description: Flexera One - IT Visibility - NA - Service Disruption
Timeframe: September 18th, 11:43 AM to September 18th, 2:04 PM PDT
Incident Summary
On Monday, September 18th, 2023, at 11:43 AM PDT, we experienced an issue with our IT Visibility Platform, potentially impacting customers in North America. As a result, customers may have encountered issues when accessing Flexera One UI Data Dashboards, Data Exports, API access, and ServiceNow integration.
Our preliminary investigation showed that a widespread failure of pods was a contributing factor to the problem. Our technical team also conducted health checks to ensure the accessibility of other regions, confirming that the problem was isolated to North America. All attempts to restart our internal services resulted in failure.
At 12:11 PM, our further investigation indicated a potential network outage affecting our operations, leading our teams to suspect a problem with our service provider. By 12:18 PM PDT, we confirmed that the issue indeed originated from a network problem at the service provider's end.
Upon establishing contact with the service provider, we discovered that they were experiencing an issue affecting the availability of multiple Zones within the US Region, where network mappings were not properly propagated to the underlying hardware.
We closely monitored the service provider's troubleshooting efforts, and by 2:04 PM PDT, they had successfully restored their services tied to our operations. Our team confirmed that all our internal services, including IT Visibility Dashboards and data, were fully operational and accessible to customers without any additional problems.
The decision was made to keep monitoring the environment for a few more hours while the service provider worked towards complete recovery, to prevent any potential recurrence. After this extended monitoring period, our team verified that our services remained operational without any further issues. As a result, we considered the incident resolved.
Root Cause
The root cause of the issue was attributed to a network issue with our service provider.
Remediation Actions
Future Preventative Measure
Obtain Post-Mortem from the Service Provider: We will obtain a post-mortem report from the service provider to understand the incident's cause and the steps they've taken to prevent future occurrences. If necessary, we will engage in discussions to enhance preventive measures.