Flexera One UI - NA - Experiencing Performance Delays
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Flexera One UI - NA – Platform Experienced Performance Delays
Timeframe: April 18th, 2024, 1:11 PM to April 18th, 2024, 3:36 PM PDT

Incident Summary

On April 18th, at 2:29 PM PDT, the major incident management team received an alert concerning an issue affecting the Flexera One Platform in the NAM region, resulting in a noticeable performance degradation in the UI and slower response times for users.
We swiftly initiated an investigation and took proactive measures to mitigate the issue. By 3:01 PM PDT, performance was restored to normal levels after scaling up resources. However, the underlying cause of the increased resource demand remained unidentified.

At 3:32 PM PDT, upon analyzing the logs, we identified a significant surge in requests to one of the critical services, resulting in a notable impact on system resources. Further analysis revealed that the surge in requests was primarily attributed to backend tasks associated with child policies generated from meta parent policies. This influx of requests further led to the degradation of the critical service, with downstream impacts on the resources associated with it.

At 3:36 PM PDT, our technical teams confirmed that the affected service had returned to its normal state, prompting us to adjust the resources back to their usual levels. Subsequently, another round of health checks was conducted to validate sustained stability, marking the resolution of the incident.

Root Cause

A surge in backend tasks associated with child policies generated from meta parent policies led to a significant increase in requests to one of the critical services, subsequently impacting system resources and resulting in the degradation of the critical service.

Remediation Actions

  1. Swift Resource Engagement: Swiftly engaged resources to address the issue promptly upon detection.
  2. Proactive Measures and Resource Scaling: Took proactive measures to mitigate the impact of the incident, including scaling up resources to restore performance to normal levels.
  3. Continuous Monitoring and Health Checks: Implemented continuous monitoring and subsequent health checks to confirm resolution and ensure sustained stability.

Future Preventative Measures

  1. Auto-scaling Mechanisms Implementation: Implement auto-scaling mechanisms to dynamically adjust resource allocation based on demand, ensuring optimal performance and resource utilization without manual intervention.
  2. Streamlined Data Access: Enhance system efficiency by implementing caching mechanisms to optimize data retrieval, reducing the frequency of system calls, and enhancing overall performance.
  3. Enhanced Monitoring Capabilities: Enhance monitoring capabilities to enable automated alerts for faster incident detection, allowing for proactive intervention and mitigation of potential issues before they escalate.
Posted May 03, 2024 - 19:25 PDT

Resolved
This incident has been resolved.
Posted Apr 18, 2024 - 16:50 PDT
Update
To swiftly alleviate the situation, we have scaled up the resources and performance appears to have returned to normal levels. However, we are continuing to investigate the root cause of the additional load, which necessitated this scaling up.
Posted Apr 18, 2024 - 16:03 PDT
Investigating
Incident Description: We are currently experiencing service degradation affecting the Flexera One UI in the US region, resulting in slower response times for users. This issue is specific to the US region, with no impact on other regions.

Priority: P2

Restoration Activity: Our technical teams are actively involved and are evaluating the situation. Additionally, we are exploring potential solutions to rectify the issue as quickly as possible.
Posted Apr 18, 2024 - 14:37 PDT
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Beacon Communication, IT Asset Management - US Inventory Upload, IT Asset Management - US Login Page, IT Asset Management - US Batch Processing System, IT Asset Management - US Business Reporting, IT Asset Management - US SaaS Manager, IT Asset Management - US Restful APIs), Flexera One - IT Visibility - North America (IT Visibility US), Flexera One – APIs – North America (api.flexera.com), and Flexera One - Cloud Management - North America (Cloud Cost Optimization - US, Cloudscape).