Flexera One - IT Asset Management - NA - Reconciliation and Inventory Delay Issues

Incident Report for Flexera System Status Dashboard

Postmortem

Description: Flexera One - IT Asset Management - NA - Reconciliation and Inventory Delay Issues

Timeframe:  March 24, 2025, 12:53 PM PST to March 25, 2025, 11:37 AM PST

Incident Summary

On March 24, 2025, at 12:53 PM PST, we identified performance issues in the North America (NA) region of Flexera One IT Asset Management (ITAM). Affected customers may have experienced delays in reconciliation runs, inventory updates, and slow UI performance. Library imports were also impacted due to a growing processing backlog.

Our teams determined the root cause to be an overloaded database server within the affected cluster, resulting in high strain on resources and delayed job execution. Initial remediation steps included restarting the scheduler and performing a failover, which helped stabilize processing.

As a remediation action, the team identified a configuration change that would allow the database to use more memory. The maximum memory allocation was increased, and this setting was standardized across all database servers to ensure consistency and prevent future occurrences of similar issues.

In parallel, a new server instance was provisioned in the NA region to accommodate the onboarding of new tenants. This measure is intended to reduce strain on the existing cluster by distributing future workload growth more effectively.

After verifying platform stability and confirming with impacted customers, the issue was declared resolved on March 25, 2025, at 11:37 AM PST.

Root Cause

 

Upon investigation, our teams determined that a database server in the affected cluster became overloaded, with memory and other system resources maxed out. This resulted in failure or severe delays of scheduled jobs such as reconciliations and library imports. UI responsiveness was also degraded due to the increased strain on backend processes.

Remediation Actions

 

Initial Investigation and Containment:

  • Identified a backlog of reconciliation runs and import jobs affecting performance.
  • Restarted scheduler to attempt resuming library import processing.

Failover and Recovery:

  • Performed a failover of the affected environment to shift workloads.
  • Observed that library imports began processing successfully and within expected timeframes after the failover.

System Configuration Changes:

  • Increased the maximum memory allocation for the database to handle higher processing loads.
  • Standardized this memory configuration across all database servers to ensure consistency.

Capacity Planning:

  • Set up a new server instance in the NA region.
  • Updated provisioning strategy to onboard new tenants into this new instance, preventing additional strain on the existing one.

Future Preventative Measures

 

Proactive Resource Monitoring:

·        Enhance monitoring to detect early signs of resource exhaustion on database servers.

·        Implement automated alerts for high memory or processing load thresholds.

Scalability Improvements:

·        Review cluster resource allocation strategies to better handle traffic growth and workload spikes.

Infrastructure Readiness:

·        Expand capacity planning to ensure redundancy and scalability in each region.

·        Regularly assess cluster health and performance baselines for early intervention.

Posted Apr 09, 2025 - 07:06 PDT

Resolved

We have completed our investigation and resolved the issues impacting IT Asset Management (ITAM) services for a subset of customers in the North America region. Reconciliation, inventory processing, and UI performance have returned to normal, and backlog processing is complete.

Any lingering issues with downstream services identified during this incident have been determined to have a separate cause and will be tracked and addressed independently. This incident is now considered resolved.
Posted Mar 25, 2025 - 11:58 PDT

Update

We are continuing to monitor system performance. While improvements have been observed, some backlog processing remains in progress.

In parallel, we are investigating delays related to downstream inventory data processing. Our teams are actively working on remediation and will share further updates as we move forward.
Posted Mar 25, 2025 - 09:30 PDT

Update

Inventory processing has returned to normal following the resolution of earlier issues. Backlog processing is progressing, but it may take additional time to fully clear.

We are continuing to monitor system performance and are evaluating next steps to improve processing efficiency in related downstream services. Further updates will be shared as progress continues.
Posted Mar 24, 2025 - 21:26 PDT

Update

System performance has recovered, and services are currently operational. Processing of the remaining backlog is ongoing and may take additional time to complete.

As a long-term measure, we are discussing allocating additional resources to support improved processing capacity and reduce the risk of similar delays in the future.
Posted Mar 24, 2025 - 16:33 PDT

Update

Our teams continue to monitor processing performance and are working through backlog recovery. We are also assessing whether the ongoing service degradation may be impacting additional customers.
Posted Mar 24, 2025 - 14:18 PDT

Identified

Incident Description: We are currently addressing disruptions in our IT Asset Management (ITAM) services affecting a subset of customers in the North America region. These issues are impacting reconciliation processes, inventory updates, and user interface performance. Customers may encounter delays in these areas, including delayed Application Recognition Library (ARL) imports.

Priority: P2

Restoration Activity: Immediate measures, including failovers and scheduler restarts, have been implemented to mitigate these disruptions, resulting in improved processing times. Our teams are actively working to ensure continued stability and smooth operation of the services. We will continue to provide updates as we make further progress.
Posted Mar 24, 2025 - 13:08 PDT
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Inventory Upload, IT Asset Management - US Batch Processing System).