Roll back Cloud Run

Cloud Run keeps every revision that was ever ready. Rolling back is one update-traffic call — no redeploy, no image rebuild.

Time to rollback: ~30 seconds once you know the target revision.

When to roll back

Fresh deploy is showing 5xx / latency spikes.
New revision OOMs on startup (see also OOM / crashloop).
Customer reports a regression you can correlate to the last deploy.
You need to revert a config change (env var, memory, concurrency) applied via gcloud run services update.

Decide what to roll back to

1. List recent revisions newest-first

gcloud run revisions list --service=tappass \
  --project=tappass-prod --region=europe-west1 \
  --limit=10 \
  --format='table(name,active,status.conditions[0].lastTransitionTime.date("%Y-%m-%d %H:%M"),spec.containers[0].image.basename())'

The ACTIVE column shows which revision currently has traffic.

2. Identify the last-known-good

Two signals to pick the target:

Last revision with zero 5xx in the logs for its active window.
Released SHA you know works — check in #deploys Slack or git history for the last stable commit.

Cross-check against the revision's lastTransitionTime so you know it was actually in service (not just a no-traffic probe revision).

Execute the rollback

# Target — replace <revision-name>
TARGET=tappass-direct-<sha>-<suffix>

gcloud run services update-traffic tappass \
  --project=tappass-prod --region=europe-west1 \
  --to-revisions="$TARGET=100"

The command prints the full traffic table — confirm 100% on your target.

Verify the rollback stuck

# 1. Served release matches the target revision's SHA
curl -s https://eu.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \
  grep -oE 'release.*"[a-f0-9]+"' | head -1

# 2. 5xx flattens out
gcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND httpRequest.status>=500' \
  --project=tappass-prod --limit=30 --freshness=2m \
  --format='value(timestamp)' | wc -l
# Expect 0 after 1–2 min (instances drain)

# 3. Health probe recovers
for i in 1 2 3 4 5; do
  curl -s -o /dev/null -w "%{http_code} %{time_total}s\n" \
    https://eu.tappass.ai/api/health/live
done

Post-rollback

Leave the broken revision alone — don't delete it. You may need its logs for the postmortem.
Open an incident — see Incident response.
Root-cause before re-deploying — the same SHA will break the same way unless the underlying bug is fixed.

Staging version

Same commands, swap tappass-prod → tappass-staging. Staging has min_instances=0 so the rollback target may scale to zero between tests; the first request after an idle period will cold-start.

Gotchas

Symptom	Cause	Fix
`update-traffic` succeeds but users still see old behaviour	Cloudflare edge cache on static assets	Purge CF cache for `eu.tappass.ai` or wait ~5 min for TTL
Target revision shows `status.condition = False`	Revision failed a readiness check long ago	Pick an older revision or deploy a new one — can't route traffic to a non-ready revision
Session-bound behaviour persists after rollback	JWT signed by old revision still valid	Expected — sessions rotate on next login; don't break users mid-flow
`--to-revisions` rejects your target name	Typo; revision names are long	Tab-complete or copy-paste from `gcloud run revisions list` output

Also see

Deploy core server — manual forward deploy when rollback isn't the right move.
Incident response — the rollback is usually step 1 of an incident, not the whole response.
OOM / crashloop — when the forward deploy OOMs, rollback first, then diagnose.