SVM2019 Intermittency
Incident Report for Flexera System Status Dashboard
Postmortem

Description:

Security Vulnerability Management 2019 Customers may have experienced intermittent degraded performance accessing SVM 2019 after the migration to Amazon Web Services.

Timeframe: August 18th 20:08 to August 18th 23:52 PDT

Root Cause

Investigations by Flexera engineers found the load-testing conducted prior to the migration to AWS was unable to reveal several code-level scaling issues in SVM 2019.

Incident Summary

On Friday August 16th at 7pm PDT Flexera migrated SVM2019 from our datacenter in Copenhagen to multiple availability zones in the Ireland region of the AWS cloud.

This migration introduced a new scalable and immutable infrastructure with far greater capacity than the previous infrastructure. However, load-testing conducted prior to the migration failed to reveal several code-level scaling issues that were revealed on Sunday August 18th at 20:08 PDT once SVM2019 was under regular customer load.

In order to protect customer data and to minimize the impact of the scaling issues, the SRE team intentionally reduced the amount of customer traffic able to reach the SVM2019 application resulting in periods of intermittency while fixes were tested and then applied to the application in production.

At Sunday August 18th at 23:52 PDT CDT application service was confirmed restored.

Corrective Actions

· SVM 2019 Database was further optimized for the AWS environment

· Web servers were scaled up to utilize a smaller number of more powerful AWS servers

Posted Aug 26, 2019 - 14:54 PDT

Resolved
Support teams have optimised the SVM database and application to increase performance. Health checks have been completed successfully and performance has been stable for the past 30 minutes.
Posted Aug 18, 2019 - 23:52 PDT
Identified
Support are currently investigating a possible database issue that appears to be impacting performance.
Posted Aug 18, 2019 - 21:25 PDT
Monitoring
Support have identified a configuration issue and have remediated it. Health checks and post-restoration monitoring are being performed to ensure services are stable.
Posted Aug 18, 2019 - 20:54 PDT
Update
We are continuing to investigate this issue.
Posted Aug 18, 2019 - 20:37 PDT
Investigating
Security Vulnerability Management 2019 Customers are currently experiencing intermittent degraded performance. Support have been engaged and are currently investigating.
Posted Aug 18, 2019 - 20:36 PDT
This incident affected: Software Vulnerability Manager (Software Vulnerability Manager Web Portal, Software Vulnerability Manager Agent Interface).