Skip to content

Roll back Cloud Run

Cloud Run keeps every revision that was ever ready. Rolling back is one update-traffic call — no redeploy, no image rebuild.

Time to rollback: ~30 seconds once you know the target revision.

  • Fresh deploy is showing 5xx / latency spikes.
  • New revision OOMs on startup (see also OOM / crashloop).
  • Customer reports a regression you can correlate to the last deploy.
  • You need to revert a config change (env var, memory, concurrency) applied via gcloud run services update.
Terminal window
gcloud run revisions list --service=tappass \
--project=tappass-prod --region=europe-west1 \
--limit=10 \
--format='table(name,active,status.conditions[0].lastTransitionTime.date("%Y-%m-%d %H:%M"),spec.containers[0].image.basename())'

The ACTIVE column shows which revision currently has traffic.

Two signals to pick the target:

  • Last revision with zero 5xx in the logs for its active window.
  • Released SHA you know works — check in #deploys Slack or git history for the last stable commit.

Cross-check against the revision's lastTransitionTime so you know it was actually in service (not just a no-traffic probe revision).

Terminal window
# Target — replace <revision-name>
TARGET=tappass-direct-<sha>-<suffix>
gcloud run services update-traffic tappass \
--project=tappass-prod --region=europe-west1 \
--to-revisions="$TARGET=100"

The command prints the full traffic table — confirm 100% on your target.

Terminal window
# 1. Served release matches the target revision's SHA
curl -s https://eu.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \
grep -oE 'release.*"[a-f0-9]+"' | head -1
# 2. 5xx flattens out
gcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND httpRequest.status>=500' \
--project=tappass-prod --limit=30 --freshness=2m \
--format='value(timestamp)' | wc -l
# Expect 0 after 1–2 min (instances drain)
# 3. Health probe recovers
for i in 1 2 3 4 5; do
curl -s -o /dev/null -w "%{http_code} %{time_total}s\n" \
https://eu.tappass.ai/api/health/live
done
  1. Leave the broken revision alone — don't delete it. You may need its logs for the postmortem.
  2. Open an incident — see Incident response.
  3. Root-cause before re-deploying — the same SHA will break the same way unless the underlying bug is fixed.

Same commands, swap tappass-prodtappass-staging. Staging has min_instances=0 so the rollback target may scale to zero between tests; the first request after an idle period will cold-start.

SymptomCauseFix
update-traffic succeeds but users still see old behaviourCloudflare edge cache on static assetsPurge CF cache for eu.tappass.ai or wait ~5 min for TTL
Target revision shows status.condition = FalseRevision failed a readiness check long agoPick an older revision or deploy a new one — can't route traffic to a non-ready revision
Session-bound behaviour persists after rollbackJWT signed by old revision still validExpected — sessions rotate on next login; don't break users mid-flow
--to-revisions rejects your target nameTypo; revision names are longTab-complete or copy-paste from gcloud run revisions list output
  • Deploy core server — manual forward deploy when rollback isn't the right move.
  • Incident response — the rollback is usually step 1 of an incident, not the whole response.
  • OOM / crashloop — when the forward deploy OOMs, rollback first, then diagnose.