Flexera One- IT Asset Management- EU- Web UI slowness

Incident Report for Flexera System Status Dashboard

Postmortem

Description: Flexera One- IT Asset Management- EU- Web UI slowness

Timeframe:  February 3, 2025, at 12:04 AM PST to February 3, 2025, at 10:10 AM PST

Incident Summary

On February 3rd at 12:04 am PST, we received multiple reports from customers in the EU region where they were experiencing slowness and errors on the Flexera One IT Asset Management (ITAM) Web UI due to a high number of open or blocked sessions. Our technical teams were promptly engaged for investigation and remediation. During the investigation, they discovered an issue with a database server that was failing to retrieve stored procedures efficiently, resulting in cross-database locking and degraded UI performance. To mitigate the issue, a failover was performed, significantly improving performance by 4:42 am PST, with the UI loading properly despite session counts remaining relatively elevated. Extended monitoring confirmed that cross-database blocking had ceased post-failover, and the issue was declared resolved at 10:10 am PST. The elevated session counts were attributed to backlogged resolver processing, and no underlying product issue was identified.

Root Cause

The root cause of the incident was identified as a database performance bottleneck triggered by:

·        Stored Procedure Retrieval Failure: The database server experienced failures in retrieving stored procedures, leading to frequent requests.

·        System Procedure Blocking: Frequent requests caused a system procedure to become blocked.

·        Cross-Database Locking: The blocked system procedure resulted in cross-database locking, which significantly impacted the ITAM Web UI's performance, leading to slowness and errors.

Contributing Factors:

  • High Session Counts: The locking and recompilation issues contributed to a large number of open and blocked sessions, further exacerbating the performance problems.

Remediation Actions

·        Initial Investigation and Session Termination: The team began investigating the high session count and attempted to terminate major blockage-causing sessions. However, new blocking sessions continued to emerge.

·        Failover: As a decisive remediation step, a failover was performed. This action significantly improved the system's responsiveness and reduced the impact on customers.

·        Post-Failover Monitoring: The team closely monitored the system after the failover. While the session count remained elevated, it stabilized, and no further cross-database blocking was observed.

·        Extended Monitoring and Restoration Declaration: After extended monitoring confirmed the system's stability and the resolution of the performance issues, the incident was declared restored at 10:10 AM PST

Future Preventative Measures

·        Enhanced Monitoring: Implement more granular monitoring to proactively detect similar issues.

·        Capacity Planning and Scalability: Review and update capacity planning for the database cluster to ensure it can handle unexpected spikes in activity.

·        Server Performance Optimization: Conduct a thorough review of Server performance metrics and identify potential bottlenecks.

·        Resolver Processing Review: Investigate the backlogged resolver processing that caused elevated session counts after the failover and implement improvements to avoid future backlogs.

Posted Mar 06, 2025 - 22:21 PST

Resolved

We have monitored the system for an extended period and continue to observe positive outcomes, with performance remaining stable. Additionally, database activity has returned to normal levels. This incident is now considered resolved.
Posted Feb 03, 2025 - 10:27 PST

Update

Further investigation has linked the issue to database performance delays, resulting in slowness and errors. A failover was performed, improving system response. While activity levels remain elevated due to a potential backlog, performance is stable. Monitoring continues, and we will provide further updates as we progress.
Posted Feb 03, 2025 - 06:34 PST

Identified

The issue has been identified and a fix is being implemented.
Posted Feb 03, 2025 - 04:55 PST

Investigating

Incident Description:
Our teams are currently investigating an issue affecting some customers using the IT Asset Management Web UI in the EU region. Impacted customers may experience slowness or errors when accessing the Flexera One ITAM web UI.

Priority: P2

Restoration activity:
Our technical team is actively working on identifying the root cause and restoring services. Further updates will be provided as we continue our efforts to resolve the incident.
Posted Feb 03, 2025 - 02:27 PST
This incident affected: Flexera One - IT Asset Management - Europe (IT Asset Management - EU Beacon Communication, IT Asset Management - EU Inventory Upload, IT Asset Management - EU Login Page, IT Asset Management - EU Batch Processing System, IT Asset Management - EU Business Reporting, IT Asset Management - EU SaaS Manager, IT Asset Management - EU Restful APIs).