Atlassian | 2022-04-05T00:00:00Z
Atlassian 2022 Outage: Site Deletion and Long-Tail Recovery
Atlassian's 2022 outage happened when a maintenance script deleted customer cloud sites, forcing a careful restoration process that lasted far longer than the original action.
Incident answer
Impact: A subset of Atlassian Cloud customers lost access to Jira, Confluence, and related products for days.
Root cause: A maintenance script intended for a small app cleanup deleted entire customer sites.
Lesson: Destructive automation needs explicit blast-radius limits, dry runs, approvals, and restore procedures tested before production use.
Quick Summary
In April 2022, Atlassian suffered a major cloud outage for a subset of customers after a maintenance script deleted customer sites. Atlassian's incident update and FAQ describes the deletion mistake and the long restoration process.
The incident is memorable because the trigger was brief, but recovery took days. Destructive changes can be fast to execute and slow to undo.
Why It Mattered
Atlassian products often hold critical engineering and business workflows: tickets, runbooks, project plans, documentation, and incident coordination. Losing access to those tools during an outage can block teams from doing their own work.
The case is also a strong reminder that customer trust depends on recovery maturity, not just prevention.
Root Cause Pattern
The pattern was destructive automation without enough blast-radius control. A script intended for targeted cleanup affected whole customer sites.
Warning signs in this class:
- The operation can delete or disable customer-owned resources.
- The script runs across many tenants.
- The operator cannot easily preview exact affected objects.
- Restore procedures exist but are slow, manual, or untested at scale.
Remediation Themes
Practical lessons:
- Require dry runs and explicit affected-resource lists.
- Add tenant-level and global deletion caps.
- Use multi-person approval for high-impact destructive operations.
- Test restore workflows as part of production readiness.
What Engineers Should Practice
When reviewing operational tooling, ask what happens if the tool is pointed at the wrong target. The best guardrails make dangerous operations hard to run accidentally and easy to stop early.
The lesson is blunt: if deletion is automated, restoration must be engineered too.