Skip to content

Deployment architecture

This is the runtime layout of the platform — how Cloudflare, Cloud Run, Cloud SQL, Memorystore, KMS, Secret Manager, and GCS fit together on a real request.

CI/CD (how code gets here) lives in Ops → Deployments. This page is about where code runs.

flowchart TB
  U["End user / customer agent"]

  subgraph CF["Cloudflare"]
    direction LR
    CFnet["DNS + WAF + edge cache
Access on internal-docs
Pages for tappass.ai, docs, trust"] end subgraph CR["Cloud Run — europe-west1"] direction LR TP["tappass container
cpu=2, 4Gi, workers=2
FastAPI + uvicorn"] OPA["opa sidecar
cpu=1, 1Gi
localhost:8181"] TP <--> OPA end subgraph VPC["Private VPC — europe-west1"] direction LR DB["Cloud SQL Postgres
private IP + unix socket"] RD["Memorystore Redis
basic tier, 1Gi"] end subgraph CTRL["Control / state"] direction LR SM["Secret Manager
tappass-*"] KMS["KMS
vault DEK KEK"] GCS["GCS
audit archive, migrations"] end U -->|TLS 1.3| CF CF -->|TLS + X-Origin-Verify| CR CR -->|VPC connector
PRIVATE_RANGES_ONLY| VPC CR -->|Google API| CTRL classDef user fill:#3a1f1f,stroke:#8a4646,color:#f5d5d5 classDef edge fill:#3a2e1b,stroke:#8a7240,color:#f5e7c7 classDef compute fill:#1f3a2c,stroke:#468a68,color:#c7f5d7 classDef data fill:#1b2a3a,stroke:#406f8a,color:#c7dff5 class U user class CFnet edge class TP,OPA compute class DB,RD,SM,KMS,GCS data

Prod lives in project tappass-prod. Staging is the same shape in project tappass-staging with lighter sizing and scale-to-zero.

Scaling (prod): min_instances=1, max_instances=20, concurrency=40.

A single POST /v1/chat/completions traverses the whole stack:

  1. DNS (eu.tappass.ai → Cloudflare anycast): 1–10 ms, cached.
  2. Cloudflare edge: WAF + bot-management, optional cache for static paths. Adds X-Origin-Verify: <HMAC> header signed with the shared secret.
  3. Cloud Run ingress: terminates TLS, routes to an instance of the tappass service based on concurrency (40/instance) and health.
  4. tappass container (uvicorn worker): middleware pipeline runs — request ID, security headers, tenant resolver, auth.
  5. OPA sidecar (localhost:8181): authz query for the caller.
  6. VPC connector → Cloud SQL: agent / pipeline / policy lookups over the private unix socket mount.
  7. Pipeline runner: 49 steps, each may call Redis (rate-limit, session cache), Secret Manager (rare, mostly at cold start), or the vault.
  8. Vault decrypt: row ciphertext → DEK unwrap via KMS → AEAD decrypt → LLM provider key in memory for the duration of the call.
  9. External LLM provider (Anthropic / OpenAI / Azure / …): outbound through the VPC connector's default egress route.
  10. Audit write: back through Cloud SQL. Hash-chained, signed.
  11. Response stream back up through Cloud Run → Cloudflare → caller.

Warm p50: ~200–500 ms before the LLM. Cold start adds 8–12 s for the first request after scale-from-zero (rare in prod because min_instances=1).

RoleWhy here
DNSSingle source of truth; TTL 300s for fast failover
WAF + DDoSL7 filtering before traffic hits Cloud Run's bill
Edge cacheJS/CSS chunks with hashed filenames — max-age=86400, immutable. HTML is no-cache (injected runtime config)
Accessinternal-docs.tappass.ai only — Google Workspace SSO, @tappass.ai only
PagesStatic marketing + docs sites — separate pipeline from the core server, see Deploy static sites manually

Blast radius if Cloudflare is down: the .run.app URL is still reachable but our WAF/DDoS and edge cache are gone. Static sites go down. DNS recovery is typically <5 min even in severe CF outages because the anycast network has many independent failure domains.

Two containers in one pod:

ContainerSizingRole
tappasscpu=2, memory=4Gi, --workers 2FastAPI + uvicorn, the main app
opacpu=1, memory=1GiPolicy engine on localhost:8181

Scaling

  • min_instances=1 — one instance always warm (prod). Set to 2 briefly during the demo-burst window; reverted for cost.
  • max_instances=20 — headroom for parallel-fetch bursts.
  • concurrency=40 per instance — lower than Cloud Run's 80 default so each request gets more CPU and the scheduler spins up sooner.
  • cpu-throttling=false — always-on CPU so warm instances respond in ms.

Why a sidecar, not a separate service? OPA decisions are sub-ms and on the hot path of every request. Inter-container loopback is free; a separate Cloud Run hop would add 10–30 ms to every call and double our scaling problem (two services to right-size).

Blast radius if Cloud Run is down: entire platform down. This is the one piece with no graceful degradation — if this is down, we declare SEV1 and page.

Cloud Run is serverless and has no VPC by default. The connector is the bridge:

  • Private IP only for Cloud SQL + Memorystore (no public exposure).
  • egress=PRIVATE_RANGES_ONLY for VPC-internal traffic; public traffic (LLM providers, Sentry, PostHog) routes via Cloud Run's default egress.
  • NAT'd source IP from a static range so LLM providers can IP-allow if needed.

Blast radius if connector is down: DB and Redis unreachable → fail-closed responses. Historically stable; single-point-of-failure mitigated by Cloud Run re-creating connector instances on demand.

SettingValue
Instancetappass-db (tier db-g1-small)
Regioneurope-west1
HAdisabled in prod today — cost trade-off, revisit once revenue justifies
BackupsAutomated daily + 7-day PITR
ConnectionCloud SQL unix socket mount (/cloudsql/<instance>/) — no TCP exposure
AuthMD5, user tappass, password in Secret Manager

Blast radius if Postgres is down: everything stops. Auth, policy, vault, audit all go through Postgres. See Restore from backup for the recovery procedure.

SettingValue
TierBasic (single-node)
Size1 GiB
Regioneurope-west1
AuthPassword in Secret Manager, TLS within VPC
PurposeRate-limit counters, session cache, leader-lock for background workers

Blast radius if Redis is down: rate-limit falls back to in-memory (per-instance), background worker leader-lock degrades to per-instance execution (duplicate work but correct). The app keeps serving.

Holds every runtime secret (see Vendors → reference for the full list). Referenced from Cloud Run via secretKeyRef env bindings so values are injected at cold start, not baked into images.

Blast radius if Secret Manager is down: new instances can't start. Running instances continue serving until they're evicted.

projects/tappass-prod/locations/europe-west1/keyRings/tappass-vault/cryptoKeys/vault-dek wraps the vault's per-org DEK. The KEK never leaves KMS.

Blast radius if KMS is down: no new vault decrypts → LLM calls fail for any agent whose provider key hasn't been cached by the pre-warm loop. This is the highest-value single dependency on the request path. Google SLA is 99.95%.

Two buckets:

  • tappass-prod-migrations — hosts schema.sql + numbered SQL migrations the Cloud Run Job pulls on apply.
  • tappass-prod-audit-archive — daily cold-storage copy of the audit chain for DPA retention.

Neither is on the request path; both are append-only.

TrafficPathEncryption
User browser → appInternet → Cloudflare → Cloud RunTLS 1.3 both hops
Cloud Run ↔ OPAlocalhost:8181 in-pod loopbackNone (trust boundary = pod)
Cloud Run ↔ Cloud SQLVPC → unix socketGoogle-internal encrypted
Cloud Run ↔ MemorystoreVPC → private IPTLS
Cloud Run ↔ Secret Manager / KMSVPC → Google APITLS (Google-managed)
Cloud Run → LLM providerVPC egress → internetTLS 1.3
Cloud Run → Sentry / PostHog / ResendVPC egress → internetTLS 1.3

Ordered by how bad a full outage is, worst first:

ServiceSeverityGraceful fallback?
Cloud SQLSEV1None — all reads/writes go here
Cloud Run (service)SEV1None — this is the service
KMSSEV1-ishCached DEK survives ~10 min; then calls fail
Secret ManagerSEV2Running instances keep serving; new instances can't start
VPC connectorSEV1None — breaks DB + Redis paths
MemorystoreSEV3In-memory fallback for rate-limit; leader-lock degrades
CloudflareSEV2.run.app URL still works; marketing/docs down
Sentry / PostHogSEV4Silent — telemetry lost, app unaffected
ResendSEV4Onboarding email queued (fire-and-forget), credentials still served in-UI

Staging is the same shape, but:

  • min_instances=0 (scale-to-zero) — first request after idle cold-starts.
  • memory_limit=2Gi on tappass (smaller dataset, --workers 2 fits).
  • Vault runs the plain-DEK path (no KMS envelope) for cheaper tests.
  • Uptime checks bumped to 30s timeout on both envs to absorb cold-start tails.
  • Sentry projects are separate from prod (staging-tappass-backend / staging-tappass-frontend) so staging noise doesn't pollute prod release health. See Ops → Deployments for the split.
DateChangeWhy
2026-04-22memory=4Gi on prod tappass--workers 2 OOMed at 2Gi
2026-04-22concurrency=40 (from 80)Dashboard fan-out saturated max=10 instances
2026-04-22Staging Sentry split from prodRelease-health signal
2026-04-22Frontend DSN via Secret ManagerRotation without terraform apply
2026-04-23min_instances=1 (from 2)Cut always-on bill ~$170/mo; cold-start risk absorbed by 30s uptime timeout