Flexera One - IT Asset Management - NA - Batch Job Processing Delays

Incident Report for Flexera System Status Dashboard

Postmortem

Description: Flexera One – IT Asset Management – NA – Batch Job Processing Delays

Timeframe: July 29, 2025, 9:00 PM PDT to July 30, 2025, 2:05 PM PDT

Incident Summary
On July 29, 2025, at 9:00 PM PDT, Flexera One IT Asset Management in the North America region experienced a disruption in batch job processing. While the platform remained accessible, certain scheduled tasks such as inventory ingestion and data exports were delayed.

The disruption followed a scheduled production release and was identified when job queues began to increase unexpectedly. Two batch processors had become unresponsive, resulting in jobs not being processed as expected. During this time, precalculation tasks continued to be submitted, adding further load to the system.

To avoid further complications, tasks were not manually retriggered. Technical teams focused on restoring processing capacity and implemented a temporary configuration to block new precalculation task submissions. The unresponsive processors were replaced, and task distribution across active processors resumed.

Processing throughput steadily improved, and full job execution was confirmed by July 30, 2025, at 2:05 PM PDT. No data loss occurred, and validations confirmed that scheduled jobs resumed as expected.

Root Cause

Primary Root Cause

The disruption was caused by two batch processors becoming unresponsive following a scheduled release. This halted scheduled job execution and led to a backlog of unprocessed tasks.

Contributing Factors:

• Unresponsive processors prevented jobs from being executed as scheduled
• Task queues grew during the disruption, increasing overall processing delays
• Ongoing precalculation task submissions contributed additional load
• Recovery was dependent on processor replacement and mitigation of queue growth

Remediation Actions

  1. Processor Replacement: The unresponsive batch processors were terminated and replaced with healthy instances to restore execution capacity
  2. Scheduler Validation: The job scheduler was confirmed to be distributing tasks properly across all active processors
  3. Temporary Precalc Submission Block: A configuration change was applied to prevent new precalculation tasks from being submitted during the recovery process
  4. Precalc Cleanup Activity: Existing precalculation tasks were reduced to lower the queue pressure and accelerate recovery
  5. Post-Recovery Validation: Teams monitored system behavior to confirm job processing returned to normal and that no data loss occurred

Future Preventative Measures

  1. Automated Processor Health Detection: Enhancements are planned to improve detection of unresponsive processors and support faster recovery during disruptions
  2. Queue Behavior Monitoring: Additional observability will be introduced to monitor queue growth trends in real time and provide early warning of processing delays
  3. Post-Release Job Validation Enhancements: Improvements to post-deployment monitoring will help identify issues with job execution earlier and reduce time to mitigation
Posted Aug 12, 2025 - 19:28 PDT

Resolved

Job processing has stabilized and the backlog has cleared. All tasks are completing as expected, and services have returned to normal.

This incident has been resolved.
Posted Jul 30, 2025 - 14:33 PDT

Update

Processing throughput continues to improve, and backlog levels are steadily decreasing. Batch jobs are catching up, and we will continue to monitor closely until job processing returns to normal.
Posted Jul 30, 2025 - 12:46 PDT

Monitoring

Incident Description: We are currently addressing an issue impacting batch job processing for IT Asset Management (ITAM) services in the North America region. While the system remains accessible, some customer tasks, including data exports and inventory processing, may experience delays in completion.

Priority: P2

Restoration Activity: Our technical team has identified the issue and applied mitigations to improve processing throughput. Backlog levels are steadily decreasing, and job execution is progressing. We estimate that processing will return to normal within the next few hours. We will continue to monitor the situation closely and provide further updates until the backlog has cleared and services are fully restored.
Posted Jul 30, 2025 - 11:25 PDT
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Batch Processing System).