Description: Flexera One - IT Asset Management - NA - Inventory Data Delayed
Timeframe: March 25, 2025, 12:00 PM PST to March 28, 2025, 5:30 AM PDT
Incident Summary
On Tuesday, March 25, 2025, at 12:00 PM PST, our teams identified an issue affecting Inventory data uploads in the NAM region. This disruption led to a backlog in the processing of IT Asset Management (ITAM) services across North America (NA), resulting in delays for customers in accessing up-to-date inventory data on the platform. One customer formally reported the issue of outdated inventory data through a support case, which initiated an active investigation by our engineering and support teams.
While a similar situation was noted in the EU environment, the backlog levels there remained minimal and within acceptable operating thresholds. Continuous monitoring confirmed that EU operations were stable, with no reported impact on customers in that region.
The investigation indicated that five customers were primarily affected, with two of them significantly contributing to the backlog. The backlog arose from a surge in inventory data processing demand that processing infrastructure in the NA region struggled to efficiently manage the increased load, resulting in delays in data updates within the platform. Our teams performed several infrastructure enhancements that helped increase the processing and reduce the backlog for all customers. The backlog was cleared for all customers except one . The issue was declared as restored on March 28, 2025, 5:30 AM PDT while our support teams engaged the remaining customer directly to continue with the investigation. Once the system reached sufficient stability, we reverted the infrastructure changes, and the affected customers are no longer hosted on their individual servers.
Root Cause
The backlog arose from a surge in inventory data processing demand, primarily driven by several high-volume customers. The shared processing infrastructure in the NA region struggled to efficiently manage the increased load, resulting in delays in data updates within the platform. Processing queues became overwhelmed, and the current prioritization mechanisms proved inadequate to alleviate the congestion during peak load periods.
Remediation Actions
· Infrastructure Segregation: The largest contributing customer was migrated to a dedicated infrastructure with additional servers configured for redundancy. This change resulted in approximately double the previous inventory processing throughput for that customer.
· Expansion of Dedicated Resources: Dedicated infrastructure groups were provisioned for the remaining three high-volume customers to isolate their workloads from the shared environment.
· Configuration Tuning: Processing prioritization rules were adjusted to focus resources on the top three customers with the most significant backlogs.
· Capacity Scaling: An additional processing server was added to the largest contributor's dedicated infrastructure group after backlog growth persisted despite prior changes.
· Product Enhancement Implementation: Our team implemented a product enhancement to improve processing times. A hotfix was deployed that has successfully resulted in significantly improved processing performance.
· Monitoring and Validation: Ongoing monitoring confirmed a steady decline in backlog volumes across all five major impacted customers. The backlog in NA was eventually cleared for all customers except one, whose processing continued to be monitored closely.
· Customer Communication: Support engaged with the remaining affected customer directly and provided timely updates.
Future Preventative Measures
· Infrastructure Scaling Strategy: Evaluate and implement a scalable infrastructure model that allows dynamic allocation of dedicated resources to high-volume tenants.
· Enhanced Processing Prioritization: Develop a more adaptive prioritization framework that responds to real-time backlog growth patterns and tenant processing needs.
· Capacity Planning Improvements: Introduce proactive capacity alerts and automatic workload redistribution to avoid overloading shared infrastructure.