Software Vulnerability Research- All regions- Intermittent login issues

Incident Report for Flexera System Status Dashboard

Postmortem

Description: Software Vulnerability Research- All regions- Intermittent login issues

Timeframe: June 7, 2025, 07:57PM PST to June 8, 2025, 00:50 AM PST

Incident Summary 

On Saturday, June 7, 2025, at 07:57 PM PST, our teams detected an issue impacting the Software Vulnerability Research (SVR) application in all regions. Although the application remained operational, some users intermittently experienced issues while attempting to log in. Initial diagnostics indicated that endpoint health checks were passing and application pages were auto-resolving as expected. However, the application itself was unresponsive in several instances.

Our technical teams initiated an investigation and identified that the unresponsiveness was likely due to memory utilization reaching critical levels for the underlying messaging service leading to system instability.

 In collaboration with our cloud service provider, we determined that the messaging service was overloaded due to a buildup of excessive and stale active connections. This overload led to communication delays within the system, which impacted the application's responsiveness and user experience.

To mitigate the issue, our technical teams created and executed a shutdown script to stop active connections and clear stale sessions from the messaging service. After monitoring the application and confirming that stability had been restored, the incident was declared resolved on June 8, 2025, at 12:50 AM PST.

Root Cause 

The root cause of the incident was an overloaded messaging service, which had accumulated a large number of stale and active connections. This overload hindered internal message flow and degraded application performance, despite endpoint health checks returning successful results.

Remediation Actions 

·        Our technical teams developed and executed a shutdown script to:

  1. Stop excessive active connections.
  2. Clear stale sessions in the messaging service.

·        Monitored the application post-remediation to ensure stable functionality.

·        Declared the incident resolved after sustained stability and no recurrence.

Future Preventative Measures

 

·        New Messaging Service Instance- Provisioned a new messaging service instance with optimized configuration settings.

·        Traffic Redirection- Redirected all application traffic to the new instance to ensure consistent performance.

·        Old Instance Isolation - Retained the previous messaging service instance in isolation for further investigation.

·        Provider Postmortem Engagement -Engaged with the cloud provider for a detailed analysis of the root cause and long-term prevention strategies.

Posted Jun 13, 2025 - 05:13 PDT

Resolved

Stopping and clearing stale and excessive connections on the messaging server has resolved the issue. Our team continues to monitor the service to ensure stability.
Posted Jun 08, 2025 - 00:55 PDT

Identified

Our technical teams have identified an overloaded messaging service as the likely cause of the issue. They are actively working to stop and clear stale and excessive connections that are contributing to the application disruption.
Posted Jun 07, 2025 - 23:24 PDT

Update

Our teams suspect that the issue may be related to a messaging server and are actively coordinating with our cloud service provider to investigate further. We are working closely with them to identify and resolve the underlying cause.
Posted Jun 07, 2025 - 21:20 PDT

Investigating

Incident Description:
We are currently investigating an issue affecting the Software Vulnerability Research application. While the application is operational, some users may experience intermittent issues when attempting to log in.

Priority: P2

Restoration Activity: Our technical team is actively investigating the issue and working to fully restore functionality. They have observed unusual load on our database and are currently analyzing the root cause.
Posted Jun 07, 2025 - 19:36 PDT
This incident affected: Software Vulnerability Research.