Flexera One - IT Visilbility -NAM-PowerBI reports and "GraphQL" APIs are unavailable
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Flexera One – IT Visibility – NA – PowerBI reports and GraphQL APIs were Unavailable

Timeframe: September 3rd, 2024, 4:30 AM to September 3rd, 2024, 11:48 AM PDT

Incident Summary

On September 3, 2024, at 4:30 AM PDT, we experienced a service disruption that impacted customers in the NA region using the IT Visibility platform. During the incident, customers may have been unable to access PowerBI reports and GraphQL APIs, which could have caused delays in reporting and operational inefficiencies for some users. The issue was isolated to the NA region, with no reported impact in other regions.

The disruption occurred when an unintended operation was executed during routine development work. This action led to the temporary unavailability of services related to PowerBI and GraphQL APIs. Upon detecting the issue, our technical teams immediately began remediation measures to address the disruption.

By 8:07 AM PDT, the majority of affected users had their services restored, and our team continued working diligently to resolve the issue for the remaining customers. Throughout the incident, our teams closely monitored the situation and conducted multiple health checks to ensure platform stability. Full restoration of services was confirmed by 11:48 AM PDT, after which normal operations resumed.

Root Cause

The disruption was caused by an unintended operation during routine development, which temporarily disabled a key functionality, resulting in the loss of access to PowerBI reports and GraphQL APIs for customers in the NA region.

Remediation Actions

  1. Issue Identification and Investigation: The issue was immediately identified at 4:30 AM PDT following the unintended operation during development work, which impacted PowerBI reports and GraphQL APIs. The team began remediation measures immediately to address the disruption.
  2. Service Restoration Efforts: The technical team quickly implemented corrective measures to restore access by re-enabling the affected functionality. By 8:07 AM PDT, services were restored for the majority of impacted customers. The team continued working diligently to resolve the issue for remaining customers.
  3. Diagnostics and Health Checks: Throughout the restoration process, extensive diagnostics and health checks were performed to monitor the stability of the platform and ensure that restored services were functioning as expected.
  4. Full-Service Restoration and Validation: By 11:48 AM PDT, full-service restoration was achieved. Additional validations and system checks were conducted to confirm that the platform was performing normally.
  5. Post-Restoration Monitoring: Following the restoration, extended monitoring was conducted to confirm that the issue had been fully resolved, and no further disruptions occurred.

Future Preventative Measures

In response to this incident, a retrospective meeting was conducted to analyze the incident and identify areas for improvement. As a result, Flexera is considering several key measures to prevent similar incidents in the future:

  1. Access Controls and Automation: We are reviewing access control mechanisms and evaluating the automation of critical tasks to ensure operations are handled with the appropriate level of access, reducing the likelihood of human error and unintended actions in production environments.
  2. Enhanced Monitoring and Alerts: While the issue was identified promptly, we are exploring the implementation of advanced alert systems to further enhance real-time detection and ensure swift action for any future potential disruptions.
  3. Peer Review for High-Risk Operations: Implementing a peer review process to ensure that critical tasks are reviewed by multiple team members before execution.
Posted Sep 18, 2024 - 20:30 PDT

Resolved
The recovery process is now complete, and all impacted organizations have been fully restored.

Please note: This issue specifically affected the IT Visibility component within the Technology Intelligence platform, leading to disruptions in accessing "PowerBI" reports and "GraphQL" APIs. The issue has been resolved, and all services are fully operational.
Posted Sep 03, 2024 - 12:13 PDT
Update
The recovery process is progressing well, with a significant number of organizations already completed. The team is actively identifying and processing the remaining organizations to ensure full recovery. We will continue to provide updates as progress is made.
Posted Sep 03, 2024 - 08:44 PDT
Identified
Our teams are currently investigating an issue impacting some customers in the NAM region. Affected users may be unable to view "PowerBI" reports or access "GraphQL" APIs.

Priority: P2

Restoration activity:
Our technical teams are actively investigating this issue. They have identified the root cause and are working to restore the services
Posted Sep 03, 2024 - 05:43 PDT
This incident affected: Flexera One - IT Visibility - North America (IT Visibility US).