Deployment architecture

This is the runtime layout of the platform — how Cloudflare, Cloud Run, Cloud SQL, Memorystore, KMS, Secret Manager, and GCS fit together on a real request.

CI/CD (how code gets here) lives in Ops → Deployments. This page is about where code runs.

Topology

flowchart TB
  U["End user / customer agent"]

  subgraph CF["Cloudflare"]
    direction LR
    CFnet["DNS + WAF + edge cache
Access on internal-docs
Pages for tappass.ai, docs, trust"]
  end

  subgraph CR["Cloud Run — europe-west1"]
    direction LR
    TP["tappass container
cpu=2, 4Gi, workers=2
FastAPI + uvicorn"]
    OPA["opa sidecar
cpu=1, 1Gi
localhost:8181"]
    TP <--> OPA
  end

  subgraph VPC["Private VPC — europe-west1"]
    direction LR
    DB["Cloud SQL Postgres
private IP + unix socket"]
    RD["Memorystore Redis
basic tier, 1Gi"]
  end

  subgraph CTRL["Control / state"]
    direction LR
    SM["Secret Manager
tappass-*"]
    KMS["KMS
vault DEK KEK"]
    GCS["GCS
audit archive, migrations"]
  end

  U -->|TLS 1.3| CF
  CF -->|TLS + X-Origin-Verify| CR
  CR -->|VPC connector
PRIVATE_RANGES_ONLY| VPC
  CR -->|Google API| CTRL

  classDef user fill:#3a1f1f,stroke:#8a4646,color:#f5d5d5
  classDef edge fill:#3a2e1b,stroke:#8a7240,color:#f5e7c7
  classDef compute fill:#1f3a2c,stroke:#468a68,color:#c7f5d7
  classDef data fill:#1b2a3a,stroke:#406f8a,color:#c7dff5
  class U user
  class CFnet edge
  class TP,OPA compute
  class DB,RD,SM,KMS,GCS data

Prod lives in project tappass-prod. Staging is the same shape in project tappass-staging with lighter sizing and scale-to-zero.

Scaling (prod): min_instances=1, max_instances=20, concurrency=40.

Request lifecycle, physical view

A single POST /v1/chat/completions traverses the whole stack:

DNS (eu.tappass.ai → Cloudflare anycast): 1–10 ms, cached.
Cloudflare edge: WAF + bot-management, optional cache for static paths. Adds X-Origin-Verify: <HMAC> header signed with the shared secret.
Cloud Run ingress: terminates TLS, routes to an instance of the tappass service based on concurrency (40/instance) and health.
tappass container (uvicorn worker): middleware pipeline runs — request ID, security headers, tenant resolver, auth.
OPA sidecar (localhost:8181): authz query for the caller.
VPC connector → Cloud SQL: agent / pipeline / policy lookups over the private unix socket mount.
Pipeline runner: 49 steps, each may call Redis (rate-limit, session cache), Secret Manager (rare, mostly at cold start), or the vault.
Vault decrypt: row ciphertext → DEK unwrap via KMS → AEAD decrypt → LLM provider key in memory for the duration of the call.
External LLM provider (Anthropic / OpenAI / Azure / …): outbound through the VPC connector's default egress route.
Audit write: back through Cloud SQL. Hash-chained, signed.
Response stream back up through Cloud Run → Cloudflare → caller.

Warm p50: ~200–500 ms before the LLM. Cold start adds 8–12 s for the first request after scale-from-zero (rare in prod because min_instances=1).

Per-component detail

Cloudflare

Role	Why here
DNS	Single source of truth; TTL 300s for fast failover
WAF + DDoS	L7 filtering before traffic hits Cloud Run's bill
Edge cache	JS/CSS chunks with hashed filenames — `max-age=86400, immutable`. HTML is no-cache (injected runtime config)
Access	`internal-docs.tappass.ai` only — Google Workspace SSO, `@tappass.ai` only
Pages	Static marketing + docs sites — separate pipeline from the core server, see Deploy static sites manually

Blast radius if Cloudflare is down: the .run.app URL is still reachable but our WAF/DDoS and edge cache are gone. Static sites go down. DNS recovery is typically <5 min even in severe CF outages because the anycast network has many independent failure domains.

Cloud Run service `tappass`

Two containers in one pod:

Container	Sizing	Role
`tappass`	cpu=2, memory=4Gi, `--workers 2`	FastAPI + uvicorn, the main app
`opa`	cpu=1, memory=1Gi	Policy engine on `localhost:8181`

Scaling

min_instances=1 — one instance always warm (prod). Set to 2 briefly during the demo-burst window; reverted for cost.
max_instances=20 — headroom for parallel-fetch bursts.
concurrency=40 per instance — lower than Cloud Run's 80 default so each request gets more CPU and the scheduler spins up sooner.
cpu-throttling=false — always-on CPU so warm instances respond in ms.

Why a sidecar, not a separate service? OPA decisions are sub-ms and on the hot path of every request. Inter-container loopback is free; a separate Cloud Run hop would add 10–30 ms to every call and double our scaling problem (two services to right-size).

Blast radius if Cloud Run is down: entire platform down. This is the one piece with no graceful degradation — if this is down, we declare SEV1 and page.

VPC Access Connector

Cloud Run is serverless and has no VPC by default. The connector is the bridge:

Private IP only for Cloud SQL + Memorystore (no public exposure).
egress=PRIVATE_RANGES_ONLY for VPC-internal traffic; public traffic (LLM providers, Sentry, PostHog) routes via Cloud Run's default egress.
NAT'd source IP from a static range so LLM providers can IP-allow if needed.

Blast radius if connector is down: DB and Redis unreachable → fail-closed responses. Historically stable; single-point-of-failure mitigated by Cloud Run re-creating connector instances on demand.

Cloud SQL (Postgres)

Setting	Value
Instance	`tappass-db` (tier `db-g1-small`)
Region	`europe-west1`
HA	disabled in prod today — cost trade-off, revisit once revenue justifies
Backups	Automated daily + 7-day PITR
Connection	Cloud SQL unix socket mount (`/cloudsql/<instance>/`) — no TCP exposure
Auth	MD5, user `tappass`, password in Secret Manager

Blast radius if Postgres is down: everything stops. Auth, policy, vault, audit all go through Postgres. See Restore from backup for the recovery procedure.

Memorystore (Redis)

Setting	Value
Tier	Basic (single-node)
Size	1 GiB
Region	`europe-west1`
Auth	Password in Secret Manager, TLS within VPC
Purpose	Rate-limit counters, session cache, leader-lock for background workers

Blast radius if Redis is down: rate-limit falls back to in-memory (per-instance), background worker leader-lock degrades to per-instance execution (duplicate work but correct). The app keeps serving.

Secret Manager

Holds every runtime secret (see Vendors → reference for the full list). Referenced from Cloud Run via secretKeyRef env bindings so values are injected at cold start, not baked into images.

Blast radius if Secret Manager is down: new instances can't start. Running instances continue serving until they're evicted.

KMS

projects/tappass-prod/locations/europe-west1/keyRings/tappass-vault/cryptoKeys/vault-dek wraps the vault's per-org DEK. The KEK never leaves KMS.

Blast radius if KMS is down: no new vault decrypts → LLM calls fail for any agent whose provider key hasn't been cached by the pre-warm loop. This is the highest-value single dependency on the request path. Google SLA is 99.95%.

GCS

Two buckets:

tappass-prod-migrations — hosts schema.sql + numbered SQL migrations the Cloud Run Job pulls on apply.
tappass-prod-audit-archive — daily cold-storage copy of the audit chain for DPA retention.

Neither is on the request path; both are append-only.

Networking cheat sheet

Traffic	Path	Encryption
User browser → app	Internet → Cloudflare → Cloud Run	TLS 1.3 both hops
Cloud Run ↔ OPA	`localhost:8181` in-pod loopback	None (trust boundary = pod)
Cloud Run ↔ Cloud SQL	VPC → unix socket	Google-internal encrypted
Cloud Run ↔ Memorystore	VPC → private IP	TLS
Cloud Run ↔ Secret Manager / KMS	VPC → Google API	TLS (Google-managed)
Cloud Run → LLM provider	VPC egress → internet	TLS 1.3
Cloud Run → Sentry / PostHog / Resend	VPC egress → internet	TLS 1.3

Per-service failure map

Ordered by how bad a full outage is, worst first:

Service	Severity	Graceful fallback?
Cloud SQL	SEV1	None — all reads/writes go here
Cloud Run (service)	SEV1	None — this is the service
KMS	SEV1-ish	Cached DEK survives ~10 min; then calls fail
Secret Manager	SEV2	Running instances keep serving; new instances can't start
VPC connector	SEV1	None — breaks DB + Redis paths
Memorystore	SEV3	In-memory fallback for rate-limit; leader-lock degrades
Cloudflare	SEV2	`.run.app` URL still works; marketing/docs down
Sentry / PostHog	SEV4	Silent — telemetry lost, app unaffected
Resend	SEV4	Onboarding email queued (fire-and-forget), credentials still served in-UI

Staging differences

Staging is the same shape, but:

min_instances=0 (scale-to-zero) — first request after idle cold-starts.
memory_limit=2Gi on tappass (smaller dataset, --workers 2 fits).
Vault runs the plain-DEK path (no KMS envelope) for cheaper tests.
Uptime checks bumped to 30s timeout on both envs to absorb cold-start tails.
Sentry projects are separate from prod (staging-tappass-backend / staging-tappass-frontend) so staging noise doesn't pollute prod release health. See Ops → Deployments for the split.

Recent topology changes worth knowing

Date	Change	Why
2026-04-22	`memory=4Gi` on prod tappass	`--workers 2` OOMed at 2Gi
2026-04-22	`concurrency=40` (from 80)	Dashboard fan-out saturated max=10 instances
2026-04-22	Staging Sentry split from prod	Release-health signal
2026-04-22	Frontend DSN via Secret Manager	Rotation without terraform apply
2026-04-23	`min_instances=1` (from 2)	Cut always-on bill ~$170/mo; cold-start risk absorbed by 30s uptime timeout

Also see

Ops → Deployments — CI/CD map (how code gets here), not runtime topology (this page).
Ops → Infrastructure — Terraform module and variable reference.
Security architecture — encryption layers and trust zones that sit on top of this topology.
Runbooks → OOM / crashloop — memory sizing decisions in context.
Runbooks → Cold-start / uptime — what to do when this topology's cold-start tail trips the probe.