Customers using Shard 3 were experiencing lengthy delays during Launches via Self Service.
January 24th 03:30 AM to January 24th 04:15 PM PST
On Friday 01-24-2020 customers on Shard 3 reported Launches from Self Service were taking substantially longer to complete than normal.
Investigations found that the service manager scheduling adapter container, although deployed, had failed on a critical syslog error and therefore was not accepting any more requests. To restore functionality quickly, the containers were re-deployed.
Monitoring the redeployed service confirmed that launches from Self Service were now proceeding within normal timeframes. Services were confirmed fully restored at 4:15pm am on the 25th of January PST.
The cause of this incident was found to be a Scheduling Adapter had stopped leasing tasks from scheduling service resulting in a high error rate and poor performance.
Alerting will be added to the Scheduling Adapter to ensure future issues of this nature are detected before they impact customers.