Flexera One - UI - NA - Service Disruption and Slow Load Times
Incident Report for Flexera System Status Dashboard
Postmortem

Description: Flexera One UI - NA - Slow Load Times and Service Disruption

Timeframe: September 4th, 2024, 1:38 PM to September 4th, 2024, 1:45 PM PDT

Incident Summary

On Wednesday, September 4th, 2024, at 1:38 PM PDT, we experienced a brief service disruption affecting the Flexera One platform in the NAM region. During this time, customers may have encountered intermittent errors and delays while accessing the platform.

The disruption was triggered by a security update that followed an internal issue with a sensitive security component. The update was applied after a comprehensive review, with no anticipated downtime. However, an unexpected system behavior prevented the new security configuration from being properly applied, resulting in temporary unavailability of the platform. Our technical team acted swiftly to revert the update, and normal operations were fully restored by 1:45 PM PDT.

Upon further investigation, it was confirmed that the sensitive components involved in the update were handled within a controlled environment, ensuring no external exposure. Our security team conducted a thorough review and verified there was no external exposure and risk to sensitive data. The issue was contained within internal systems, reducing the potential for a broader impact.

To prevent future disruptions, the update will undergo extensive testing in a staging environment to ensure smooth integration. Once validated, the update will be re-applied during a scheduled maintenance window to avoid further service interruptions.

Following the resolution, extended monitoring and internal health checks confirmed that the platform was functioning normally, after which the incident was officially declared resolved.

Root Cause

The disruption was caused by an issue during a security update that was triggered to replace a sensitive internal component. Specifically, the update was intended to rotate a security token that had been mistakenly pushed to an internal repository.

During the update process, the system encountered a synchronization error, which prevented the new security configuration from applying correctly. This led to temporary unavailability of the platform.

No external exposure occurred, and the problem was isolated within internal systems.

Remediation Actions

  1. Update Reversion: Upon identifying the issue during the security update, the technical team quickly reverted the update to restore normal platform functionality. This immediate action resolved the platform unavailability and allowed customers to regain access.
  2. Security Scope Validation: The sensitive security component involved in the update was confirmed to have been handled entirely within a controlled internal environment, ensuring that no external exposure or data breach occurred.
  3. Extended Monitoring: Following the reversion of the update, extended monitoring was initiated to closely observe the platform’s performance and ensure there were no further disruptions or lingering effects of the issue.

Future Preventative Measures

  1. Comprehensive Staging Environment Testing: To prevent future recurrences, we have performed thorough testing in a staging environment that mirrors the live environment. We are now preparing for a production deployment that will be implemented in the near future, with careful coordination to avoid any further disruptions.
  2. Transition to Problem Management and Ongoing Collaboration: The incident has been transitioned to the problem management investigation. We are actively collaborating with relevant teams to gather insights and refine our processes. Lessons learned from this incident will guide improvements in both security update and change management processes.
  3. Enhance Security Checks: We will implement stronger security checks to ensure sensitive information is properly managed during updates.
  4. Automate token Rotation: The token rotation process will be fully automated to reduce the risk of service disruptions in future updates.
  5. Standardize Deployment Process: A standardized deployment checklist has been implemented to ensure consistent and thorough validation across all platforms.
  6. Improve Token Management: We will implement a new, more secure method of managing access tokens to ensure better security and reduce manual handling.
Posted Oct 10, 2024 - 09:47 PDT

Resolved
Incident Description: We experienced a brief period of service disruption affecting the Flexera One UI in the NA region. During this time, customers may have encountered slow loading times when accessing the Flexera One UI or difficulties accessing certain features within the UI.

Priority: P1

Impact Start Time: September 4, 2024, 1:38 PM PDT
Impact End Time: September 4, 2024, 1:45 PM PDT

Restoration Activity: Our technical team has identified that the disruption was caused by an urgent internal update necessary to enhance system reliability. This update was not anticipated to result in any downtime, which is why an outage notice was not issued. Unfortunately, it inadvertently affected system performance, leading to temporary access issues.

We apologize for any inconvenience this may have caused. The issue has been resolved, and health checks confirm that all services are stable and the Flexera One UI is fully accessible. We are reviewing our processes to prevent similar incidents in the future.
Posted Sep 04, 2024 - 14:25 PDT
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Beacon Communication, IT Asset Management - US Inventory Upload, IT Asset Management - US Login Page, IT Asset Management - US Batch Processing System, IT Asset Management - US Business Reporting, IT Asset Management - US SaaS Manager, IT Asset Management - US Restful APIs), Flexera One - IT Visibility - North America (IT Visibility US), Flexera One – APIs – North America (api.flexera.com), and Flexera One - Cloud Management - North America (Cloud Cost Optimization - US, Cloudscape).