Description: Snow Atlas - West Europe & East US 2 - SAM Core Permission Errors
Timeframe: November 11, 2022, from 6:00 AMPDT to November 11, 2022, from 7:00 AM PDT
Incident Summary
On Monday, November 11, 2022, from 6:00 AM PDT , we encountered an issue affecting the Snow Atlas platform, where users in the West Europe and East US 2 regions encountered permission-related errors when accessing SAM Core functionality. The issue caused limited access to certain areas of the platform. Other regions remained unaffected, and no customer-reported incidents were received during this time. Multiple tenants in the West Europe and East US 2 regions experienced permission errors that restricted access to SAM Core. No impact was observed in other regions. The issue was traced to a scheduled maintenance event by our Cloud service provider around 6:00 AM PDT. By 6:15 AM PDT, all affected services had restarted and were fully operational. Full platform stabilization was delayed until 7:00 AM PDT due to the high volume of services attempting to reconnect simultaneously.
Root Cause
The issue stemmed from a scheduled maintenance by our Cloud service provider at approximately 6:00 AM PDT. During this event:
Remediation Actions
· Service Restarts: All affected services were automatically restarted, restoring their operational state by 6:15 AM PDT.
· Traffic Monitoring: Continuous monitoring was performed to ensure no further connection losses occurred during the recovery process.
· Full Stabilization: By 7:00 AM PDT, the platform was stabilized, and normal operations resumed.
Future Preventative Measures
· Our teams are investigating the impact of concurrent reconnections during recovery scenarios to reduce delays and improve future stabilization times.
· Mitigation strategies will be explored to prevent similar disruptions during scheduled maintenance events.