Flexera One - IT Visibility & Common Spend Analytics Dashboards - EU - Data Processing Delayed
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Flexera One - IT Visibility & Common Spend Analytics Dashboards - EU - Data Processing Delayed

Timeframe:  October 31,2024, 1:31 AM PDT to November 1, 2024, 2:13 AM PDT

Incident Summary

On Thursday, October 31, 2024, at 1:31 AM PDT, a data processing delay was detected, impacting IT Visibility (ITV) and Common Spend Analytics (CSA) dashboards in the EU region. Although the dashboards remained accessible, they did not display the most recent data. Other ITV services were unaffected and operated normally.

We raised a support ticket with the service provider as soon as we detected the issue. The issue was traced to an infrastructure problem within our service provider's data center in the EU, caused by an unexpected hardware failure. Although the initial problem was mitigated, a secondary issue emerged, disrupting processes and report computations for some workspaces in the EU region. A support ticket was opened with the service provider, who began restoring impacted workspaces.

By 07:11 AM PDT, the service provider confirmed they were making good progress on restoring several affected workspaces. To maintain stability during this process, we temporarily paused specific data loading functions and closely monitored the situation until full restoration was achieved. By the end of the incident, all services were fully operational. A complete data load and end-to-end checks were conducted to verify that processes were functioning correctly, and the incident was marked as resolved on November 1, 2024, at 2:13 AM PDT.

Root Cause

 

The root cause of this incident was an unexpected hardware failure within our service provider's data center in Europe. This failure affected critical infrastructure supporting ETL processes and report computations in the EU region. Although the initial issue was mitigated, residual effects from the hardware failure led to delays in data processing and updates for some workspaces.

Remediation Actions

 

·        Service Provider Coordination: Collaborated with the service provider to address the infrastructure issue and ensure the complete restoration of affected data processing functions.

·        Stability Measures: Temporarily paused specific data loading functions to stabilize the environment during the restoration process.

·        Verification of Full Restoration: Conducted a full data reload and end-to-end checks to confirm all services were fully restored and functioning as expected.

Future Preventative Measures 

 

·        Enhanced Infrastructure Monitoring: Work with the service provider to implement additional monitoring for critical infrastructure components to detect and address potential hardware failures proactively.

·        Regular Risk Assessment: Conduct regular risk assessments of the service provider’s data centers and recovery protocols, focusing on reducing dependencies that could lead to service delays.

·        Request for Service Provider RCA: Requested a detailed RCA from the service provider, along with their planned future mitigation actions, to prevent recurrence of similar infrastructure issues.

Posted Nov 11, 2024 - 02:16 PST

Resolved
All services are now fully operational, and normal functionality has been restored across the environment. A full data load and end-to-end checks were completed to verify that all processes are working as expected. The incident has been resolved.
Posted Nov 01, 2024 - 05:47 PDT
Monitoring
The restoration of affected workspaces is complete, and we have re-enabled our data processes. A full data load is currently in progress to ensure all information is fully up-to-date.

We will provide a final update once this process is complete.
Posted Oct 31, 2024 - 18:34 PDT
Update
Our monitoring indicates that restoration of affected workspaces is still in progress. To support stability during this process, we have temporarily paused certain data loading functions. We are closely monitoring the situation and will provide updates as restoration continues.
Posted Oct 31, 2024 - 12:06 PDT
Update
Our service provider is actively restoring data in the EU region, with several workspaces already reviewed and functioning as expected. Once the full restoration is complete, we will run additional updates to ensure all information is current. Further updates will follow as the process progresses.
Posted Oct 31, 2024 - 08:29 PDT
Identified
Our service provider has confirmed they are in the process of restoring several workspaces in the EU region. As this work continues, data for some organizations will be at least 6-9 hours out of date. We are awaiting further information on the next steps and will provide updates as we receive more details.
Posted Oct 31, 2024 - 07:11 PDT
Investigating
Incident Description: We are currently experiencing a data processing delay affecting IT Visibility (ITV) and Common Spend Analytics (CSA) Dashboards in the EU region. While the dashboards are accessible, they may not display the most recent data. All other ITV services remain fully operational and unaffected.

Priority: P2

Restoration Activity: The issue has been identified on our service provider’s side, and they have informed us that they are investigating an infrastructure-related problem that may still be affecting data updates. We are actively working with them to resolve this as soon as possible and will provide further updates as more information becomes available.
Posted Oct 31, 2024 - 03:44 PDT
This incident affected: Flexera One - IT Visibility - Europe (IT Visibility EU).