Roll back Cloud Run
Cloud Run keeps every revision that was ever ready. Rolling back is
one update-traffic call — no redeploy, no image rebuild.
Time to rollback: ~30 seconds once you know the target revision.
When to roll back
Section titled “When to roll back”- Fresh deploy is showing 5xx / latency spikes.
- New revision OOMs on startup (see also OOM / crashloop).
- Customer reports a regression you can correlate to the last deploy.
- You need to revert a config change (env var, memory, concurrency)
applied via
gcloud run services update.
Decide what to roll back to
Section titled “Decide what to roll back to”1. List recent revisions newest-first
Section titled “1. List recent revisions newest-first”gcloud run revisions list --service=tappass \ --project=tappass-prod --region=europe-west1 \ --limit=10 \ --format='table(name,active,status.conditions[0].lastTransitionTime.date("%Y-%m-%d %H:%M"),spec.containers[0].image.basename())'The ACTIVE column shows which revision currently has traffic.
2. Identify the last-known-good
Section titled “2. Identify the last-known-good”Two signals to pick the target:
- Last revision with zero 5xx in the logs for its active window.
- Released SHA you know works — check in
#deploysSlack or git history for the last stable commit.
Cross-check against the revision's lastTransitionTime so you know
it was actually in service (not just a no-traffic probe revision).
Execute the rollback
Section titled “Execute the rollback”# Target — replace <revision-name>TARGET=tappass-direct-<sha>-<suffix>
gcloud run services update-traffic tappass \ --project=tappass-prod --region=europe-west1 \ --to-revisions="$TARGET=100"The command prints the full traffic table — confirm 100% on your
target.
Verify the rollback stuck
Section titled “Verify the rollback stuck”# 1. Served release matches the target revision's SHAcurl -s https://eu.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \ grep -oE 'release.*"[a-f0-9]+"' | head -1
# 2. 5xx flattens outgcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND httpRequest.status>=500' \ --project=tappass-prod --limit=30 --freshness=2m \ --format='value(timestamp)' | wc -l# Expect 0 after 1–2 min (instances drain)
# 3. Health probe recoversfor i in 1 2 3 4 5; do curl -s -o /dev/null -w "%{http_code} %{time_total}s\n" \ https://eu.tappass.ai/api/health/livedonePost-rollback
Section titled “Post-rollback”- Leave the broken revision alone — don't delete it. You may need its logs for the postmortem.
- Open an incident — see Incident response.
- Root-cause before re-deploying — the same SHA will break the same way unless the underlying bug is fixed.
Staging version
Section titled “Staging version”Same commands, swap tappass-prod → tappass-staging. Staging has
min_instances=0 so the rollback target may scale to zero between
tests; the first request after an idle period will cold-start.
Gotchas
Section titled “Gotchas”| Symptom | Cause | Fix |
|---|---|---|
update-traffic succeeds but users still see old behaviour | Cloudflare edge cache on static assets | Purge CF cache for eu.tappass.ai or wait ~5 min for TTL |
Target revision shows status.condition = False | Revision failed a readiness check long ago | Pick an older revision or deploy a new one — can't route traffic to a non-ready revision |
| Session-bound behaviour persists after rollback | JWT signed by old revision still valid | Expected — sessions rotate on next login; don't break users mid-flow |
--to-revisions rejects your target name | Typo; revision names are long | Tab-complete or copy-paste from gcloud run revisions list output |
Also see
Section titled “Also see”- Deploy core server — manual forward deploy when rollback isn't the right move.
- Incident response — the rollback is usually step 1 of an incident, not the whole response.
- OOM / crashloop — when the forward deploy OOMs, rollback first, then diagnose.