Description: Flexera One - ITAM - EU - WebUI Access/Login Issues
Timeframe: December 9, 2025, 12:57 AM PST to December 9, 2025, 1:08 AM
Incident Summary
On December 9, 2025, at approximately 12:57 AM, monitoring systems detected a disruption impacting access to the Web UI for IT Asset Management in the EU region. During the incident window, affected users encountered login error messages when attempting to access the service.
Initial triage confirmed that the issue was isolated to the login layer, while all downstream and backend services remained fully operational. Further investigation identified repeated failures of liveness and readiness probes associated with the authentication service. These probe failures caused the relevant service components to restart continuously, preventing successful login requests.
The disruption lasted approximately 11 minutes and auto-resolved once the restarts stabilized and the services returned to a healthy state. Although the incident self-recovered, our teams continued root cause analysis to understand the underlying load conditions that triggered the probe failures and to prevent recurrence.
Root Cause
Initial diagnostics indicated issues related Service components supporting authentication repeatedly failed liveness and readiness health checks due to sustained high CPU and load conditions. This resulted in continuous pod restarts, preventing successful login processing during the incident window.
Contributing Factors:
High Load Bursts
Insufficient Resource Buffer
Remediation Actions
· Automatic Recovery - The platform’s self-healing mechanisms resolved the issue without the need for manual intervention.
· Service Stabilization - Affected service components stabilized once system load subsided, and restart activity ceased.
· Authentication Health Restored - Authentication services returned to a healthy and stable operating state.
· Full Service Availability Confirmed - Login functionality was fully restored and verified by 1:08 AM.
Future Preventative Measures
· Proactive Load & Failure Analysis - Conduct detailed analysis of service logs and metrics to identify recurring load patterns and conditions that lead to health-check probe failures, enabling earlier detection of risk scenarios.
· Health Check Optimization - Review and tune liveness and readiness probe configurations, including timeout and threshold values, to ensure services can tolerate short-lived load spikes without unnecessary restarts.
· Capacity & Resource Buffer Enhancement - Increase CPU and memory allocations for authentication-related services to provide sufficient headroom during sudden traffic surges and prevent resource exhaustion.