Cloudflare | 2025-11-18T00:00:00Z
Cloudflare 2025 Outage: Bot Management Feature File Failure
Cloudflare's 2025 outage was triggered by a database permissions change that made Bot Management feature-file generation include duplicate metadata, pushing the file past a runtime limit.
Incident answer
Impact: Core CDN and security services returned elevated HTTP 5xx errors, with related impact to Turnstile, Workers KV, Access, dashboard login, and other services.
Root cause: A ClickHouse metadata query began returning duplicate column data after a permissions change; the generated Bot Management feature file doubled in size and exceeded a proxy module limit.
Lesson: Generated configuration files need schema assumptions, size limits, validation, staged propagation, and fast rollback just like application code.
Quick Summary
On November 18, 2025, Cloudflare experienced a major outage affecting core network traffic. Cloudflare's postmortem says the incident was not a cyberattack. It was triggered by a database permissions change that caused duplicate entries in a Bot Management feature file.
That generated file was propagated across Cloudflare's network. Because the file was larger than the software limit expected by the proxy module, parts of the request path failed and returned HTTP 5xx errors.
Why It Mattered
Cloudflare sits in front of a large slice of the internet. A failure in core proxy traffic handling can make many unrelated customer sites look broken at the same time.
The incident is also memorable because the original change was not to the proxy itself. It was a permissions and metadata-query interaction in a data system used to generate configuration.
Root Cause Pattern
The pattern was generated config with hidden assumptions. The Bot Management feature-file generator assumed a metadata query would return only one set of columns. After a ClickHouse permissions change, the query also saw underlying table metadata, creating duplicate rows and a larger file.
Warning signs:
- Runtime systems consume generated files without strict validation.
- The generator depends on database metadata shape.
- A file is regenerated frequently and pushed globally.
- Size limits exist in runtime code but are not enforced before propagation.
Remediation Themes
Practical lessons:
- Validate generated configuration before publishing it globally.
- Treat database metadata queries as contracts, not incidental implementation details.
- Roll out config and data-plane dependencies in stages.
- Keep a known-good file rollback path for generated artifacts.
What Engineers Should Practice
When a config file changes frequently, monitor its shape as a production signal. Size, row count, schema, and parse success are all health checks.
The practical takeaway: configuration is code once production software executes it.