Description: Flexera One - IT Asset Management - NA - Reconciliation and Inventory Delay Issues
Timeframe: March 24, 2025, 12:53 PM PST to March 25, 2025, 11:37 AM PST
Incident Summary
On March 24, 2025, at 12:53 PM PST, we identified performance issues in the North America (NA) region of Flexera One IT Asset Management (ITAM). Affected customers may have experienced delays in reconciliation runs, inventory updates, and slow UI performance. Library imports were also impacted due to a growing processing backlog.
Our teams determined the root cause to be an overloaded database server within the affected cluster, resulting in high strain on resources and delayed job execution. Initial remediation steps included restarting the scheduler and performing a failover, which helped stabilize processing.
As a remediation action, the team identified a configuration change that would allow the database to use more memory. The maximum memory allocation was increased, and this setting was standardized across all database servers to ensure consistency and prevent future occurrences of similar issues.
In parallel, a new server instance was provisioned in the NA region to accommodate the onboarding of new tenants. This measure is intended to reduce strain on the existing cluster by distributing future workload growth more effectively.
After verifying platform stability and confirming with impacted customers, the issue was declared resolved on March 25, 2025, at 11:37 AM PST.
Root Cause
Upon investigation, our teams determined that a database server in the affected cluster became overloaded, with memory and other system resources maxed out. This resulted in failure or severe delays of scheduled jobs such as reconciliations and library imports. UI responsiveness was also degraded due to the increased strain on backend processes.
Remediation Actions
Initial Investigation and Containment:
Failover and Recovery:
System Configuration Changes:
Capacity Planning:
Future Preventative Measures
Proactive Resource Monitoring:
· Enhance monitoring to detect early signs of resource exhaustion on database servers.
· Implement automated alerts for high memory or processing load thresholds.
Scalability Improvements:
· Review cluster resource allocation strategies to better handle traffic growth and workload spikes.
Infrastructure Readiness:
· Expand capacity planning to ensure redundancy and scalability in each region.
· Regularly assess cluster health and performance baselines for early intervention.