Restore from backup

What we back up

What	How	Retention
Prod Postgres	Cloud SQL automated backups + PITR (WAL archiving)	35 days
Audit chain snapshots	Daily dump to Cloud Storage bucket `tappass-audit-archive`	7 years
Secret Manager	Version history (disabled ≠ deleted)	Indefinite
Terraform state	Versioned GCS bucket	Indefinite

Point-in-time restore (PITR)

If you need to roll the DB back to a specific moment:

# List backups
gcloud sql backups list --instance=tappass-prod-pg

# Restore to a new instance
gcloud sql instances clone tappass-prod-pg tappass-restore-$(date +%s) \
  --point-in-time='2026-04-18T12:00:00Z'

Never restore over the prod instance. Always clone to a new instance, verify, then swap.

Swap

Bring the clone up and run integrity checks (see below)
Update Cloud Run connection string to point at the clone
Roll the prod revision
Verify /audit/integrity still reports intact
Snapshot the previous prod instance before deleting it

Integrity checks on a restore

-- 1. Hash chain integrity
SELECT audit_id, prev_hash, current_hash
FROM audit_events
WHERE current_hash != encode(sha256(... || prev_hash || ...), 'hex')
LIMIT 10;
-- expect 0 rows

-- 2. No gaps in the chain
SELECT COUNT(*) FROM audit_events
WHERE prev_hash IS NOT NULL
  AND prev_hash NOT IN (SELECT current_hash FROM audit_events);
-- expect 0

-- 3. Latest event timestamp
SELECT MAX(ts) FROM audit_events;

Audit chain reconstruction

If PITR isn't enough and you need to re-ingest from cold storage:

gsutil cp gs://tappass-audit-archive/2026-04-17.jsonl.zst .
zstd -d 2026-04-17.jsonl.zst
# feed into the replay tool
python -m tappass.tools.replay_audit --input 2026-04-17.jsonl

The replay tool re-computes hashes, re-signs with the current key, and inserts in order. Customers see a re-issued audit_replayed event with a reference to the original audit_id.

When to declare a SEV1

Customer-visible outage of /v1/chat/completions > 5 min
Any suspicion of data loss
Any break in the audit chain integrity check — compliance event, must be disclosed to affected customers

See Incident response for the flow.