Incident Response
Triage
- Confirm scope (all users vs specific cohorts)
- Check deploy history
- Inspect web/worker logs
Mitigation
- Roll back deploy if necessary
- Disable heavy background jobs
- Communicate status to stakeholders
Postmortem
- Document timeline
- Root cause and follow-ups
- Add alerts/tests to prevent recurrence