Description:
Following the FNMS EU Prod upgrade to R1.4, customers experienced severe performance issues in the FNMS User Interface with pages loading slowly or timing out.
Timeframe: July 18th 8:55 AM to July 19th 6:10 PM PDT
Summary:
On Thursday 07-18-2019 Customers reported severely degraded performance was rendering FNMS EU unusable following the Release of R1.4 to the EU Production environment. SRE investigated and found that the Web UI servers in the EU Production environment were exhibiting extremely high CPU and Memory utilisation.
At around 14:00 PDT 07-18 SRE scaled up the two Web server’s memory configuration from 4 GB to 16 GB – this appeared to improve performance, but CPU and Memory utilisation continued to exceed expected levels.
On Friday the 19th at around 03:30 PDT a code issue was identified, and the development of a hotfix was commenced. The Hotfix was successfully deployed to EU production at 06:00 PDT – this returned Web Server CPU, Memory utilisation and performance to expected levels.
Root Cause
This Incident was caused by enhancements in Release 1.4 causing the Web UI rendering component to use excessive memory and subsequently impacting overall performance as resources were quickly exhausted.
Corrective Actions
Testing infrastructure to be enhanced with extremely large data-sets to better identify potential performance issues during testing.