Skip to content

Deploy core server (Cloud Run)

The core server (tappass/tappass repo) normally ships via the release.yml GitHub Actions workflow on push to main. This runbook is the manual path for:

  • Hotfixes that can't wait for CI,
  • Verifying an infra-level change against prod without merging,
  • Bootstrapping after a CI outage or token rotation.

Use CI when CI works.

Terminal window
gcloud auth login
gcloud auth application-default login # only needed for terraform-adjacent steps
gcloud config set project tappass-staging

You also need crane for image mirroring between projects:

Terminal window
go install github.com/google/go-containerregistry/cmd/crane@latest
# Installs to ~/go/bin/crane

Why Cloud Build (not gcloud builds submit --tag=...)

Section titled “Why Cloud Build (not gcloud builds submit --tag=...)”

The core server's Sentry release tag is derived from the GIT_SHA Docker build-arg. gcloud builds submit --tag=... uses an auto-generated build that does not thread build-args through, so TAPPASS_BUILD_SHA ends up as the literal string dev and every deploy collapses onto the same Sentry release — breaking the issue → commit → Linear chain.

The cloudbuild.yaml at the repo root exists exactly to thread the build-arg. Always use it for manual builds.

The deploy is four steps: commit → build → mirror → deploy.

Terminal window
cd ~/tappass/tappass
git status
# … stage and commit as usual …
git push origin main
SHA=$(git rev-parse --short HEAD)
echo "Deploying $SHA"

Build against tappass-staging's Artifact Registry — we mirror to prod only after staging validation.

Terminal window
gcloud builds submit --config=cloudbuild.yaml \
--substitutions=_IMAGE=europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass:sha-$SHA,_GIT_SHA=$SHA \
--project=tappass-staging --region=europe-west1 --timeout=1800 --async .

The command prints a build ID — grab it. Builds typically take 10–15 minutes end-to-end.

Terminal window
BUILD_ID=<id-from-above>
# Poll until done
until [ "$(gcloud builds describe $BUILD_ID --project=tappass-staging --region=europe-west1 --format='value(status)' 2>/dev/null)" != "WORKING" ] && \
[ "$(gcloud builds describe $BUILD_ID --project=tappass-staging --region=europe-west1 --format='value(status)' 2>/dev/null)" != "QUEUED" ]; do
sleep 30
done
gcloud builds describe $BUILD_ID --project=tappass-staging --region=europe-west1 --format='value(status)'
# → SUCCESS (or FAILURE — logs link is in the describe output)

Deploy with --no-traffic so you can verify the revision on its tagged URL before sending real users to it.

Terminal window
SHA=<short-sha>
STAGE_REV="direct-$SHA-$(date +%s)"
gcloud run deploy tappass --project=tappass-staging --region=europe-west1 \
--image="europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass:sha-$SHA" \
--revision-suffix="${STAGE_REV:0:63}" \
--tag="direct-$SHA" \
--no-traffic

Smoke-test the tagged URL (bypasses the CDN but hits the real revision):

Terminal window
curl -sI "https://direct-$SHA---tappass-hwau24hloq-ew.a.run.app/api/health/live"
# → HTTP/2 200

When happy, cut 100% traffic:

Terminal window
gcloud run services update-traffic tappass --project=tappass-staging --region=europe-west1 \
--to-revisions="tappass-${STAGE_REV:0:63}=100"

Verify the release tag at the edge:

Terminal window
curl -s https://staging.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \
grep -oE 'release.*"[a-f0-9]+"' | head -1
# → release: "<your-sha>"

Staging and prod each have their own Artifact Registry. crane cp streams the image across without pulling it locally.

Terminal window
DIGEST=$(gcloud artifacts docker images list \
europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass \
--filter="tags:sha-$SHA" --format='value(DIGEST)' --include-tags | head -1)
~/go/bin/crane cp \
"europe-west1-docker.pkg.dev/tappass-staging/tappass/tappass@$DIGEST" \
"europe-west1-docker.pkg.dev/tappass-prod/tappass/tappass:sha-$SHA"

Same shape as staging. Always --no-traffic first, verify, then cutover — do not use the default behaviour that routes 100% to the new revision.

Terminal window
SHA=<short-sha>
PROD_REV="direct-$SHA-$(date +%s)"
gcloud run deploy tappass --project=tappass-prod --region=europe-west1 \
--image="europe-west1-docker.pkg.dev/tappass-prod/tappass/tappass:sha-$SHA" \
--revision-suffix="${PROD_REV:0:63}" \
--tag="direct-$SHA" \
--no-traffic
# Smoke-test the new revision at its tag URL
curl -sI "https://direct-$SHA---tappass-mglrrjkirq-ew.a.run.app/api/health/live"
# Cut traffic
gcloud run services update-traffic tappass --project=tappass-prod --region=europe-west1 \
--to-revisions="tappass-${PROD_REV:0:63}=100"
# Verify
curl -s https://eu.tappass.ai/app -H 'User-Agent: Mozilla/5.0' | \
grep -oE 'release.*"[a-f0-9]+"' | head -1
Terminal window
# 1. Active revision matches your SHA
gcloud run services describe tappass --project=tappass-prod --region=europe-west1 \
--format='value(status.traffic)' | tr ';' '\n' | grep "'percent': 100"
# 2. Sentry backend initialised with your release tag
gcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND jsonPayload.event="sentry_initialized"' \
--project=tappass-prod --limit=1 --freshness=5m \
--format='value(jsonPayload.release)'
# → <your-sha>
# 3. No 5xx spike in the last 5 min
gcloud logging read 'resource.type=cloud_run_revision AND resource.labels.service_name=tappass AND httpRequest.status>=500' \
--project=tappass-prod --limit=20 --freshness=5m \
--format='value(timestamp,httpRequest.status,httpRequest.requestUrl)'
SymptomDiagnosisFix
release:"dev" in served HTMLUsed gcloud builds submit --tag=… (wrong invocation)Rebuild with --config=cloudbuild.yaml — see the Sentry release trap
Container fails to start with OOMMemory limit too low for the new revisionOOM / crashloop
--no-traffic deploy succeeded but prod still serves old revisionYou forgot the update-traffic step (expected behaviour)Run the update-traffic command above
Image sha-<X> not found on deployMirror step skippedRe-run the crane cp step
Creating Revision… hangs >10 minSecret Manager binding or VPC connector permission driftCheck status.conditions on the revision, see ops/infrastructure
oauth2: "invalid_grant" from terraformADC expiredgcloud auth application-default login

If the fresh deploy looks wrong, revert traffic to the previous ready revision — no redeploy needed. See Roll back Cloud Run.