Description: Snow Atlas - All Regions - Service Unavailability
Timeframe: October 29, 2025, 8:45 AM PST, to October 29, 2025, 1:25 PM PST
Incident Summary
On Wednesday, October 29, 2025, at 8:45 AM PST, Snow Atlas experienced a service disruption affecting access to the Snow Atlas portal (https://www.snowsoftware.io/) and related services. During this period, customers were unable to log in or use associated functions.
Initial investigation indicated that the issue originated from an outage within our cloud service provider’s global delivery network, which affected multiple customers across different hosted environments. The disruption was traced to a fault within the cloud service provider’s traffic management and content delivery component, which Snow Atlas uses to route and distribute portal traffic globally.
The cloud service provider later confirmed that an inadvertent configuration change within this component caused widespread routing failures, leading to loss of availability across several hosted portals. As part of their mitigation, the provider blocked all further configuration changes and initiated a rollback to the last known good configuration state.
As a restorative measure, the service provider completed redeployment of the stable configuration and began restoring healthy routing paths. Our teams confirmed that all environments were fully accessible and functioning as expected. Normal login and service operations were verified across all four production regions — America, Australia, Europe, and UK South by 1:25 PM PST.
Root Cause
The incident was caused by an inadvertent configuration change within the cloud service provider’s traffic management and content delivery system. This change disrupted global routing tables, resulting in connectivity failures for portals utilizing the affected delivery nodes.
Since Snow front-end services rely on this external traffic management component for request routing and global distribution, the configuration fault led to loss of access across all customer-facing portals.
Remediation Actions
· The cloud service provider identified the erroneous configuration change within the affected delivery network component.
· The service provider temporarily blocked all new configuration changes to prevent further propagation of the fault.
· A rollback to the last known good configuration was initiated and deployed by 10:35 AM PST.
· Healthy nodes and routing paths were gradually restored, and network traffic was rerouted through stable routes.
· Our teams continuously monitored service health and validated progress until portal accessibility was fully restored.
· At 10:35 AM PST, the service provider completed redeployment of the stable configuration and began restoring healthy routing paths.
· Our teams verified successful logins and stable operations across all production regions. Post extended monitoring for stability, the issue was declared as resolved by 1:25 PM PST.
Future Preventative Measures
· Provider Coordination: Maintain close collaboration with the cloud service provider to ensure prompt updates and detailed post-incident reports for external outages.
· Post-Incident Follow-up: Review the provider’s post-incident summary to understand the corrective measures implemented to prevent similar configuration issues in the future.