Description: FNP Activation request causing performance issue
Reported On: August 24, 2019
Tracking #: SSRE-328
Between August 24th and 31st the system experienced periods of degraded or unresponsive performance. This was the result of unexpected client behavior during license activation processes. The activity caused excessively high CPU load on a primary database which slowed down processing time on all services relying on it.
System alerts began triggering around 2AM Pacific on the 24th indicating a database problem. Restarts were performed in an effort to clear/reset connections to the system. Moving traffic to a secondary/replicate server was also done. Neither of these proved helpful. The problem continued post restart as well as followed to the secondary server. Some high volume traffic was then rerouted to the disaster recovery site. This proved effective for that traffic but did not resolve the high CPU condition in the primary data center.
Further analysis indicated that a certain SQL query was consuming the majority of the CPU. A review of the code showed that the query in question is dynamically generated based on the activity of the client attempting to license itself using FNP Activation. In this case, the client activity resulted the creation and execution of an unreasonable and non-performant SQL statement.
Researching the FNP client activity showed anomalous behavior. The client is expected to increment a sequence number with each request. In this case, that was not occurring. The FNO server application code did not adequately take that scenario into consideration when generating the SQL statement. The intermittent behavior of the service disruption can be tied to the activity of this client.
Anomalous client behavior in the FNP Trusted Storage Activation exposed an application design flaw where the application dynamically created a SQL statement that consumed all the database CPU when executed.
The immediate resolution was to alter the data related to the misbehaving client. This prevented the generation and execution of the problematic SQL statement.
The application has also been updated and deployed (Sept 12th) with new SQL which accounts for similar scenarios.