Deploy core server (Cloud Run)
The core server (tappass/tappass repo) normally ships via the
release.yml GitHub Actions workflow on push to main. This runbook
is the manual path for:
- Hotfixes that can't wait for CI,
- Verifying an infra-level change against prod without merging,
- Bootstrapping after a CI outage or token rotation.
Use CI when CI works.
Prerequisites (one-time)
Section titled “Prerequisites (one-time)”gcloud auth logingcloud auth application-default login # only needed for terraform-adjacent stepsgcloud config set project tappass-stagingYou also need crane for image mirroring between projects:
go install github.com/google/go-containerregistry/cmd/crane@latest# Installs to ~/go/bin/craneWhy Cloud Build (not gcloud builds submit --tag=...)
Section titled “Why Cloud Build (not gcloud builds submit --tag=...)”The core server's Sentry release tag is derived from the GIT_SHA
Docker build-arg. gcloud builds submit --tag=... uses an
auto-generated build that does not thread build-args through, so
TAPPASS_BUILD_SHA ends up as the literal string dev and every
deploy collapses onto the same Sentry release — breaking the
issue → commit → Linear chain.
The cloudbuild.yaml at the repo root exists exactly to thread the
build-arg. Always use it for manual builds.
Full flow
Section titled “Full flow”The deploy is four steps: commit → build → mirror → deploy.
1. Commit and push
Section titled “1. Commit and push”cd ~/tappass/tappassgit status# … stage and commit as usual …git push origin mainSHA=$(git rev-parse --short HEAD)echo "Deploying $SHA"2. Trigger Cloud Build
Section titled “2. Trigger Cloud Build”Build against tappass-staging's Artifact Registry — we mirror to
prod only after staging validation.
gcloud builds submit --config=cloudbuild.yaml \ --substitutions=_IMAGE=europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass:sha-$SHA,_GIT_SHA=$SHA \ --project=tappass-staging --region=europe-west1 --timeout=1800 --async .The command prints a build ID — grab it. Builds typically take 10–15 minutes end-to-end.
BUILD_ID=<id-from-above>
# Poll until doneuntil [ "$(gcloud builds describe $BUILD_ID --project=tappass-staging --region=europe-west1 --format='value(status)' 2>/dev/null)" != "WORKING" ] && \ [ "$(gcloud builds describe $BUILD_ID --project=tappass-staging --region=europe-west1 --format='value(status)' 2>/dev/null)" != "QUEUED" ]; do sleep 30donegcloud builds describe $BUILD_ID --project=tappass-staging --region=europe-west1 --format='value(status)'# → SUCCESS (or FAILURE — logs link is in the describe output)3. Deploy to staging
Section titled “3. Deploy to staging”Deploy with --no-traffic so you can verify the revision on its
tagged URL before sending real users to it.
SHA=<short-sha>STAGE_REV="direct-$SHA-$(date +%s)"gcloud run deploy tappass --project=tappass-staging --region=europe-west1 \ --image="europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass:sha-$SHA" \ --revision-suffix="${STAGE_REV:0:63}" \ --tag="direct-$SHA" \ --no-trafficSmoke-test the tagged URL (bypasses the CDN but hits the real revision):
curl -sI "https://direct-$SHA---tappass-hwau24hloq-ew.a.run.app/api/health/live"# → HTTP/2 200When happy, cut 100% traffic:
gcloud run services update-traffic tappass --project=tappass-staging --region=europe-west1 \ --to-revisions="tappass-${STAGE_REV:0:63}=100"Verify the release tag at the edge:
curl -s https://staging.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \ grep -oE 'release.*"[a-f0-9]+"' | head -1# → release: "<your-sha>"4. Mirror the image to prod registry
Section titled “4. Mirror the image to prod registry”Staging and prod each have their own Artifact Registry. crane cp
streams the image across without pulling it locally.
DIGEST=$(gcloud artifacts docker images list \ europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass \ --filter="tags:sha-$SHA" --format='value(DIGEST)' --include-tags | head -1)
~/go/bin/crane cp \ "europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass@$DIGEST" \ "europe-west1-docker.pkg.dev/tappass-prod/tappass/tappass:sha-$SHA"5. Deploy to prod
Section titled “5. Deploy to prod”Same shape as staging. Always --no-traffic first, verify, then
cutover — do not use the default behaviour that routes 100% to
the new revision.
SHA=<short-sha>PROD_REV="direct-$SHA-$(date +%s)"gcloud run deploy tappass --project=tappass-prod --region=europe-west1 \ --image="europe-west1-docker.pkg.dev/tappass-prod/tappass/tappass:sha-$SHA" \ --revision-suffix="${PROD_REV:0:63}" \ --tag="direct-$SHA" \ --no-traffic
# Smoke-test the new revision at its tag URLcurl -sI "https://direct-$SHA---tappass-mglrrjkirq-ew.a.run.app/api/health/live"
# Cut trafficgcloud run services update-traffic tappass --project=tappass-prod --region=europe-west1 \ --to-revisions="tappass-${PROD_REV:0:63}=100"
# Verifycurl -s https://eu.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \ grep -oE 'release.*"[a-f0-9]+"' | head -1Verify the deploy landed cleanly
Section titled “Verify the deploy landed cleanly”# 1. Active revision matches your SHAgcloud run services describe tappass --project=tappass-prod --region=europe-west1 \ --format='value(status.traffic)' | tr ';' '\n' | grep "'percent': 100"
# 2. Sentry backend initialised with your release taggcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND jsonPayload.event="sentry_initialized"' \ --project=tappass-prod --limit=1 --freshness=5m \ --format='value(jsonPayload.release)'# → <your-sha>
# 3. No 5xx spike in the last 5 mingcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND httpRequest.status>=500' \ --project=tappass-prod --limit=20 --freshness=5m \ --format='value(timestamp,httpRequest.status,httpRequest.requestUrl)'Common failure modes
Section titled “Common failure modes”| Symptom | Diagnosis | Fix |
|---|---|---|
release:"dev" in served HTML | Used gcloud builds submit --tag=… (wrong invocation) | Rebuild with --config=cloudbuild.yaml — see the Sentry release trap |
| Container fails to start with OOM | Memory limit too low for the new revision | OOM / crashloop |
--no-traffic deploy succeeded but prod still serves old revision | You forgot the update-traffic step (expected behaviour) | Run the update-traffic command above |
Image sha-<X> not found on deploy | Mirror step skipped | Re-run the crane cp step |
Creating Revision… hangs >10 min | Secret Manager binding or VPC connector permission drift | Check status.conditions on the revision, see ops/infrastructure |
oauth2: "invalid_grant" from terraform | ADC expired | gcloud auth application-default login |
Rollback
Section titled “Rollback”If the fresh deploy looks wrong, revert traffic to the previous ready revision — no redeploy needed. See Roll back Cloud Run.
Also see
Section titled “Also see”- Deployments overview — CI/CD map per service.
- Roll back Cloud Run — traffic revert when the deploy goes wrong.
- OOM / crashloop — memory-limit diagnosis and fix pattern.