Restore from backup
What we back up
Section titled “What we back up”| What | How | Retention |
|---|---|---|
| Prod Postgres | Cloud SQL automated backups + PITR (WAL archiving) | 35 days |
| Audit chain snapshots | Daily dump to Cloud Storage bucket tappass-audit-archive | 7 years |
| Secret Manager | Version history (disabled ≠ deleted) | Indefinite |
| Terraform state | Versioned GCS bucket | Indefinite |
Point-in-time restore (PITR)
Section titled “Point-in-time restore (PITR)”If you need to roll the DB back to a specific moment:
# List backupsgcloud sql backups list --instance=tappass-prod-pg
# Restore to a new instancegcloud sql instances clone tappass-prod-pg tappass-restore-$(date +%s) \ --point-in-time='2026-04-18T12:00:00Z'Never restore over the prod instance. Always clone to a new instance, verify, then swap.
- Bring the clone up and run integrity checks (see below)
- Update Cloud Run connection string to point at the clone
- Roll the prod revision
- Verify
/audit/integritystill reportsintact - Snapshot the previous prod instance before deleting it
Integrity checks on a restore
Section titled “Integrity checks on a restore”-- 1. Hash chain integritySELECT audit_id, prev_hash, current_hashFROM audit_eventsWHERE current_hash != encode(sha256(... || prev_hash || ...), 'hex')LIMIT 10;-- expect 0 rows
-- 2. No gaps in the chainSELECT COUNT(*) FROM audit_eventsWHERE prev_hash IS NOT NULL AND prev_hash NOT IN (SELECT current_hash FROM audit_events);-- expect 0
-- 3. Latest event timestampSELECT MAX(ts) FROM audit_events;Audit chain reconstruction
Section titled “Audit chain reconstruction”If PITR isn’t enough and you need to re-ingest from cold storage:
gsutil cp gs://tappass-audit-archive/2026-04-17.jsonl.zst .zstd -d 2026-04-17.jsonl.zst# feed into the replay toolpython -m tappass.tools.replay_audit --input 2026-04-17.jsonlThe replay tool re-computes hashes, re-signs with the current key, and inserts in order. Customers see a re-issued audit_replayed event with a reference to the original audit_id.
When to declare a SEV1
Section titled “When to declare a SEV1”- Customer-visible outage of
/v1/chat/completions> 5 min - Any suspicion of data loss
- Any break in the audit chain integrity check — compliance event, must be disclosed to affected customers
See Incident response for the flow.