Description: Snow Atlas – APAC & East US 2 – Daily Update Job Failures
Timeframe:
Incident Summary
On July 29, 2025, at 4:00 AM PDT, Snow Atlas experienced a disruption in its scheduled daily update jobs (DUJs) in the Asia‑Pacific (APAC) region. While customers were able to log in and access the platform, updated data was not available during this period.
Later the same day, similar failures occurred in the East US 2 (EUS2) region, where scheduled updates also did not complete. Some customers in North America missed two consecutive updates on July 29 and July 30.
During the disruption, technical teams confirmed that the failures were affecting a large number of tenants. To avoid further complications, the update jobs were not manually retriggered, and the focus remained on identifying and implementing a permanent fix.
The failures were traced to a mismatch between two dependent service components introduced during a recent deployment. In some cases, prior configuration overrides in the EUS2 prevented tenants from immediately receiving the corrected version, contributing to a second missed update.
The issue was resolved once the dependent services were realigned and configuration overrides cleared on July 30, 2025, at 12:45 AM PDT. Daily update jobs then resumed successfully: the APAC region was confirmed to be running normally by 4:00 AM PDT the same day, and the EUS2 region by 10:00 PM PDT. From that point forward, daily updates continued without interruption. No customer data was lost, and validations confirmed that updates were fully restored.
Root Cause
Primary Root Cause
The disruption was caused by dependent service updates that were not fully synchronized during a recent deployment. This misalignment prevented scheduled daily update jobs (DUJs) from completing successfully. In the North America region, prior configuration overrides delayed the application of the corrected version, contributing to a second missed update for some customers.
Contributing Factors
Remediation Actions
Post Recovery Validations: Comprehensive checks were performed to confirm that daily update jobs resumed successfully and that customer data remained intact.
Future Preventative Measures