Deployment architecture
This is the runtime layout of the platform — how Cloudflare, Cloud Run, Cloud SQL, Memorystore, KMS, Secret Manager, and GCS fit together on a real request.
CI/CD (how code gets here) lives in Ops → Deployments. This page is about where code runs.
Topology
Section titled “Topology”flowchart TB
U["End user / customer agent"]
subgraph CF["Cloudflare"]
direction LR
CFnet["DNS + WAF + edge cache
Access on internal-docs
Pages for tappass.ai, docs, trust"]
end
subgraph CR["Cloud Run — europe-west1"]
direction LR
TP["tappass container
cpu=2, 4Gi, workers=2
FastAPI + uvicorn"]
OPA["opa sidecar
cpu=1, 1Gi
localhost:8181"]
TP <--> OPA
end
subgraph VPC["Private VPC — europe-west1"]
direction LR
DB["Cloud SQL Postgres
private IP + unix socket"]
RD["Memorystore Redis
basic tier, 1Gi"]
end
subgraph CTRL["Control / state"]
direction LR
SM["Secret Manager
tappass-*"]
KMS["KMS
vault DEK KEK"]
GCS["GCS
audit archive, migrations"]
end
U -->|TLS 1.3| CF
CF -->|TLS + X-Origin-Verify| CR
CR -->|VPC connector
PRIVATE_RANGES_ONLY| VPC
CR -->|Google API| CTRL
classDef user fill:#3a1f1f,stroke:#8a4646,color:#f5d5d5
classDef edge fill:#3a2e1b,stroke:#8a7240,color:#f5e7c7
classDef compute fill:#1f3a2c,stroke:#468a68,color:#c7f5d7
classDef data fill:#1b2a3a,stroke:#406f8a,color:#c7dff5
class U user
class CFnet edge
class TP,OPA compute
class DB,RD,SM,KMS,GCS data
Prod lives in project tappass-prod. Staging is the same shape in
project tappass-staging with lighter sizing and scale-to-zero.
Scaling (prod): min_instances=1, max_instances=20,
concurrency=40.
Request lifecycle, physical view
Section titled “Request lifecycle, physical view”A single POST /v1/chat/completions traverses the whole stack:
- DNS (
eu.tappass.ai→ Cloudflare anycast): 1–10 ms, cached. - Cloudflare edge: WAF + bot-management, optional cache for
static paths. Adds
X-Origin-Verify: <HMAC>header signed with the shared secret. - Cloud Run ingress: terminates TLS, routes to an instance of
the
tappassservice based on concurrency (40/instance) and health. - tappass container (uvicorn worker): middleware pipeline runs — request ID, security headers, tenant resolver, auth.
- OPA sidecar (localhost:8181): authz query for the caller.
- VPC connector → Cloud SQL: agent / pipeline / policy lookups over the private unix socket mount.
- Pipeline runner: 49 steps, each may call Redis (rate-limit, session cache), Secret Manager (rare, mostly at cold start), or the vault.
- Vault decrypt: row ciphertext → DEK unwrap via KMS → AEAD decrypt → LLM provider key in memory for the duration of the call.
- External LLM provider (Anthropic / OpenAI / Azure / …): outbound through the VPC connector's default egress route.
- Audit write: back through Cloud SQL. Hash-chained, signed.
- Response stream back up through Cloud Run → Cloudflare → caller.
Warm p50: ~200–500 ms before the LLM. Cold start adds 8–12 s for the
first request after scale-from-zero (rare in prod because
min_instances=1).
Per-component detail
Section titled “Per-component detail”Cloudflare
Section titled “Cloudflare”| Role | Why here |
|---|---|
| DNS | Single source of truth; TTL 300s for fast failover |
| WAF + DDoS | L7 filtering before traffic hits Cloud Run's bill |
| Edge cache | JS/CSS chunks with hashed filenames — max-age=86400, immutable. HTML is no-cache (injected runtime config) |
| Access | internal-docs.tappass.ai only — Google Workspace SSO, @tappass.ai only |
| Pages | Static marketing + docs sites — separate pipeline from the core server, see Deploy static sites manually |
Blast radius if Cloudflare is down: the .run.app URL is still
reachable but our WAF/DDoS and edge cache are gone. Static sites go
down. DNS recovery is typically <5 min even in severe CF outages
because the anycast network has many independent failure domains.
Cloud Run service tappass
Section titled “Cloud Run service tappass”Two containers in one pod:
| Container | Sizing | Role |
|---|---|---|
tappass | cpu=2, memory=4Gi, --workers 2 | FastAPI + uvicorn, the main app |
opa | cpu=1, memory=1Gi | Policy engine on localhost:8181 |
Scaling
min_instances=1— one instance always warm (prod). Set to 2 briefly during the demo-burst window; reverted for cost.max_instances=20— headroom for parallel-fetch bursts.concurrency=40per instance — lower than Cloud Run's 80 default so each request gets more CPU and the scheduler spins up sooner.cpu-throttling=false— always-on CPU so warm instances respond in ms.
Why a sidecar, not a separate service? OPA decisions are sub-ms and on the hot path of every request. Inter-container loopback is free; a separate Cloud Run hop would add 10–30 ms to every call and double our scaling problem (two services to right-size).
Blast radius if Cloud Run is down: entire platform down. This is the one piece with no graceful degradation — if this is down, we declare SEV1 and page.
VPC Access Connector
Section titled “VPC Access Connector”Cloud Run is serverless and has no VPC by default. The connector is the bridge:
- Private IP only for Cloud SQL + Memorystore (no public exposure).
egress=PRIVATE_RANGES_ONLYfor VPC-internal traffic; public traffic (LLM providers, Sentry, PostHog) routes via Cloud Run's default egress.- NAT'd source IP from a static range so LLM providers can IP-allow if needed.
Blast radius if connector is down: DB and Redis unreachable → fail-closed responses. Historically stable; single-point-of-failure mitigated by Cloud Run re-creating connector instances on demand.
Cloud SQL (Postgres)
Section titled “Cloud SQL (Postgres)”| Setting | Value |
|---|---|
| Instance | tappass-db (tier db-g1-small) |
| Region | europe-west1 |
| HA | disabled in prod today — cost trade-off, revisit once revenue justifies |
| Backups | Automated daily + 7-day PITR |
| Connection | Cloud SQL unix socket mount (/cloudsql/<instance>/) — no TCP exposure |
| Auth | MD5, user tappass, password in Secret Manager |
Blast radius if Postgres is down: everything stops. Auth, policy, vault, audit all go through Postgres. See Restore from backup for the recovery procedure.
Memorystore (Redis)
Section titled “Memorystore (Redis)”| Setting | Value |
|---|---|
| Tier | Basic (single-node) |
| Size | 1 GiB |
| Region | europe-west1 |
| Auth | Password in Secret Manager, TLS within VPC |
| Purpose | Rate-limit counters, session cache, leader-lock for background workers |
Blast radius if Redis is down: rate-limit falls back to in-memory (per-instance), background worker leader-lock degrades to per-instance execution (duplicate work but correct). The app keeps serving.
Secret Manager
Section titled “Secret Manager”Holds every runtime secret (see Vendors →
reference for the full list). Referenced from
Cloud Run via secretKeyRef env bindings so values are injected at
cold start, not baked into images.
Blast radius if Secret Manager is down: new instances can't start. Running instances continue serving until they're evicted.
projects/tappass-prod/locations/europe-west1/keyRings/tappass-vault/cryptoKeys/vault-dek
wraps the vault's per-org DEK. The KEK never leaves KMS.
Blast radius if KMS is down: no new vault decrypts → LLM calls fail for any agent whose provider key hasn't been cached by the pre-warm loop. This is the highest-value single dependency on the request path. Google SLA is 99.95%.
Two buckets:
tappass-prod-migrations— hostsschema.sql+ numbered SQL migrations the Cloud Run Job pulls on apply.tappass-prod-audit-archive— daily cold-storage copy of the audit chain for DPA retention.
Neither is on the request path; both are append-only.
Networking cheat sheet
Section titled “Networking cheat sheet”| Traffic | Path | Encryption |
|---|---|---|
| User browser → app | Internet → Cloudflare → Cloud Run | TLS 1.3 both hops |
| Cloud Run ↔ OPA | localhost:8181 in-pod loopback | None (trust boundary = pod) |
| Cloud Run ↔ Cloud SQL | VPC → unix socket | Google-internal encrypted |
| Cloud Run ↔ Memorystore | VPC → private IP | TLS |
| Cloud Run ↔ Secret Manager / KMS | VPC → Google API | TLS (Google-managed) |
| Cloud Run → LLM provider | VPC egress → internet | TLS 1.3 |
| Cloud Run → Sentry / PostHog / Resend | VPC egress → internet | TLS 1.3 |
Per-service failure map
Section titled “Per-service failure map”Ordered by how bad a full outage is, worst first:
| Service | Severity | Graceful fallback? |
|---|---|---|
| Cloud SQL | SEV1 | None — all reads/writes go here |
| Cloud Run (service) | SEV1 | None — this is the service |
| KMS | SEV1-ish | Cached DEK survives ~10 min; then calls fail |
| Secret Manager | SEV2 | Running instances keep serving; new instances can't start |
| VPC connector | SEV1 | None — breaks DB + Redis paths |
| Memorystore | SEV3 | In-memory fallback for rate-limit; leader-lock degrades |
| Cloudflare | SEV2 | .run.app URL still works; marketing/docs down |
| Sentry / PostHog | SEV4 | Silent — telemetry lost, app unaffected |
| Resend | SEV4 | Onboarding email queued (fire-and-forget), credentials still served in-UI |
Staging differences
Section titled “Staging differences”Staging is the same shape, but:
min_instances=0(scale-to-zero) — first request after idle cold-starts.memory_limit=2Gion tappass (smaller dataset,--workers 2fits).- Vault runs the plain-DEK path (no KMS envelope) for cheaper tests.
- Uptime checks bumped to 30s timeout on both envs to absorb cold-start tails.
- Sentry projects are separate from prod
(
staging-tappass-backend/staging-tappass-frontend) so staging noise doesn't pollute prod release health. See Ops → Deployments for the split.
Recent topology changes worth knowing
Section titled “Recent topology changes worth knowing”| Date | Change | Why |
|---|---|---|
| 2026-04-22 | memory=4Gi on prod tappass | --workers 2 OOMed at 2Gi |
| 2026-04-22 | concurrency=40 (from 80) | Dashboard fan-out saturated max=10 instances |
| 2026-04-22 | Staging Sentry split from prod | Release-health signal |
| 2026-04-22 | Frontend DSN via Secret Manager | Rotation without terraform apply |
| 2026-04-23 | min_instances=1 (from 2) | Cut always-on bill ~$170/mo; cold-start risk absorbed by 30s uptime timeout |
Also see
Section titled “Also see”- Ops → Deployments — CI/CD map (how code gets here), not runtime topology (this page).
- Ops → Infrastructure — Terraform module and variable reference.
- Security architecture — encryption layers and trust zones that sit on top of this topology.
- Runbooks → OOM / crashloop — memory sizing decisions in context.
- Runbooks → Cold-start / uptime — what to do when this topology's cold-start tail trips the probe.