Skip to content

Security architecture

TapPass's security posture has four pillars. They stack — the outer layers protect the inner ones, and every layer fails closed. This page is the map; the sub-pages are the detail.

  1. Identity — every actor (human, service, agent, customer SDK) has a verifiable identity before they touch the request path.
  2. Trust zones — hard boundaries between planes. Crossings are explicit, logged, and rate-limited.
  3. Encryption — data is encrypted at rest under a KMS-wrapped DEK; secrets never leave Google Secret Manager in plaintext; transport is TLS to the edge and mTLS or signed within the VPC.
  4. Audit — every decision lands in a hash-chained, Ed25519-signed trail that's tamper-evident under offline verification.

Four concentric zones. Each crossing is a boundary with its own auth mechanism; no request skips zones.

flowchart TB
  subgraph Z1["Zone 1 — Public internet"]
    U["End users / customer agents"]
  end
  subgraph Z2["Zone 2 — Cloudflare edge"]
    CF["WAF + DDoS
+ Access for internal-docs
+ Pages hosting"] end subgraph Z3["Zone 3 — Cloud Run pod"] TP["tappass container
(FastAPI, uvicorn workers)"] OPA["OPA sidecar
localhost:8181"] TP <-->|loopback
no TLS| OPA end subgraph Z4["Zone 4 — Private VPC"] DB["Cloud SQL Postgres
unix socket + MD5"] RD["Memorystore Redis
password + TLS"] SM["Secret Manager
IAM-bound"] KMS["KMS
IAM-bound"] GCS["GCS
IAM-bound"] end U -->|TLS 1.3| CF CF -->|TLS + X-Origin-Verify HMAC| TP TP -->|VPC connector
PRIVATE_RANGES_ONLY| DB TP -->|VPC connector| RD TP -->|Google API| SM TP -->|Google API| KMS TP -->|Google API| GCS classDef zone1 fill:#3a1f1f,stroke:#8a4646,color:#f5d5d5 classDef zone2 fill:#3a2e1b,stroke:#8a7240,color:#f5e7c7 classDef zone3 fill:#1f3a2c,stroke:#468a68,color:#c7f5d7 classDef zone4 fill:#1b2a3a,stroke:#406f8a,color:#c7dff5 class U zone1 class CF zone2 class TP,OPA zone3 class DB,RD,SM,KMS,GCS zone4

A request falls through these zones once. Every step is a boundary:

  • Edge → Cloud Run: Cloudflare must sign with X-Origin-Verify or the AuthMiddleware rejects. This stops anyone who discovers the .run.app URL from bypassing the WAF.
  • Cloud Run → VPC: egress is PRIVATE_RANGES_ONLY — the container can't reach the public internet except for allow-listed vendor APIs (Resend, Sentry, PostHog, LLM providers) that go out via 0.0.0.0/0 on named egress.
  • VPC → data: Cloud SQL uses a unix-socket mount (no TCP exposure), Memorystore is password-auth + private IP, Secret Manager is IAM-bound, KMS is IAM-bound.

TapPass has three distinct identity primitives, each for a different actor. Confusing them is the most common security mistake.

sequenceDiagram
  participant A as Agent (customer)
  participant GW as Gateway
  participant DB as developer_keys (hashed)
  participant RD as Redis
  A->>GW: Authorization: Bearer tp_dev_…
  GW->>GW: api_key.py — Argon2id hash
  GW->>DB: lookup hash
  DB-->>GW: agent_id, org_id
  GW->>RD: rate-limit check
  GW->>GW: request.state.account = Account
  • Format: tp_dev_<random> (developer key) or tp_live_<random> (scoped deploy key). Stored hashed (Argon2id) in developer_keys.
  • Rotation: customer-initiated via dashboard; old key revoked, not deleted. See Rotate API keys.
  • Scope: one key → one agent_id. Key can never escalate across orgs.
  • Transport: Authorization: Bearer tp_... header over TLS.

2. Session JWTs — for humans on the dashboard

Section titled “2. Session JWTs — for humans on the dashboard”
sequenceDiagram
  participant H as Human (browser)
  participant IdP as SSO IdP (Google / Azure)
  participant API as /sso/callback
  participant DB as auth_sessions

  H->>IdP: OIDC / SAML sign-in
  IdP-->>H: id_token
  H->>API: /sso/callback?code=…
  API->>IdP: exchange code
  IdP-->>API: verified email + domain
  API->>API: mint session JWT (EdDSA)
  API->>DB: revoke prior sessions for this account
  API->>DB: insert new session
  API-->>H: Set-Cookie: tappass_session=… (HttpOnly, Secure, SameSite)
  • Issuer: TapPass backend; signing key in Secret Manager (TAPPASS_JWT_SECRET / TAPPASS_TOKEN_KEY_FILE).
  • Rotation: every login revokes prior sessionsidentity/signup.py:318-325. A stolen session JWT has a bounded-by-next-login blast radius.
  • Storage: HttpOnly + Secure cookie; never in localStorage.
  • Allowed domains: gated by TAPPASS_SSO_ALLOWED_DOMAINS so the SSO provider can't log in arbitrary Gmail users.

3. SPIFFE / JWT-SVID — for inter-service workload identity

Section titled “3. SPIFFE / JWT-SVID — for inter-service workload identity”
sequenceDiagram
  participant WL as Customer workload / SDK
  participant SPIRE as Customer SPIRE Agent
  participant GW as tappass backend
  participant B as SPIFFE bundle (pinned)

  WL->>SPIRE: fetch JWT-SVID
  SPIRE-->>WL: JWT-SVID (short-lived)
  WL->>GW: Authorization: Bearer 
  GW->>B: resolve signing bundle
  B-->>GW: trust anchor for this SPIFFE ID
  GW->>GW: verify signature + audience + expiry
  GW->>GW: request.state.workload = SpiffeId

Used by customer-side SDK deployments that want mTLS-equivalent identity without the cert dance — the SDK obtains a JWT-SVID from their SPIRE Agent and we verify it against a pinned bundle.

A fail-closed boundary rejects the request when its own data source is unavailable. Fail-open does the opposite. We fail closed on everything that matters:

BoundaryBehaviour on dependency failure
Auth middlewareCannot verify tp_ key → 401
OPA policy engineTimeout (>500ms) → deny (opa_authz_unavailable_denied)
Vault (credential access)Vault unreachable → 503 to agent, no fallback
Audit writerDB unavailable → retry with backoff, not silent drop
X-Origin-VerifyMissing header → 401 unless whitelisted path (/health*)
Rate limiterRedis unavailable → in-memory fallback (documented trade-off)

Two layers that stack. The row ciphertext never leaves Postgres decryptable on its own — an attacker with DB access alone can't read a provider key.

flowchart BT
  Row["Row in Postgres
vault_llm_keys.ciphertext"] DEK["DEK (data-encryption key)
per-org, in memory"] Seed["DEK seed
Secret Manager
(staging + compose path)"] KMS["KMS KEK
projects/.../cryptoKeys/vault-dek
(prod path)"] Row -->|AEAD-decrypt with DEK| DEK DEK -->|unwrap| KMS DEK -.->|or PBKDF2-derive| Seed classDef data fill:#1b2a3a,stroke:#406f8a,color:#c7dff5 classDef key fill:#1f3a2c,stroke:#468a68,color:#c7f5d7 classDef kms fill:#3a2e1b,stroke:#8a7240,color:#f5e7c7 class Row data class DEK,Seed key class KMS kms

Two paths, toggled by TAPPASS_VAULT_KEY_KMS:

  • Empty / unset: DEK seed comes straight from TAPPASS_VAULT_KEY in Secret Manager; PBKDF2 derives the AEAD key per row.
  • Set to a KMS key URI: DEK is generated per-org, wrapped by KMS at rest, unwrapped at request time. The KMS KEK never leaves Google's HSMs.

Prod runs the KMS path (shipped in commit 2a1843f). Staging runs the seed path because it keeps local tests cheap.

See tappass/vault/crypto.py for the implementation.

flowchart LR
  OP["1Password
Engineering vault"] TF["terraform tfvars
gitignored"] SM["GCP Secret Manager
tappass-*-dsn, *-api-key, *-vault-*"] CR["Cloud Run
env var via secretKeyRef"] OP -->|human eyes,
break-glass| TF TF -->|terraform apply| SM SM -->|secretKeyRef at
container start| CR classDef human fill:#3a2e1b,stroke:#8a7240,color:#f5e7c7 classDef code fill:#1b2a3a,stroke:#406f8a,color:#c7dff5 class OP human class TF,SM,CR code
  • Every runtime secret is in Secret Manager; code reaches it via tappass.secrets.get(name) which checks env vars first, then the configured backend.
  • Humans access the same values via 1Password for break-glass; 1Password is the master — Secret Manager is the deployment target.
  • tfvars files are gitignored; the .example versions are committed with placeholders.

Full detail: Secret management.

Every decision writes an AuditEvent. Each event hashes together the previous event's hash + the new event's payload, then signs the result with the audit chain's Ed25519 key. Tampering with any event breaks the chain for every subsequent event.

  • Detection: the integrity-check background worker (observability/background.py:integrity_check_worker) re-verifies the chain every 4 hours and fires a SEV alert on mismatch.
  • Retention: audit events are the one object exempt from GDPR erasure — see Customer data export for how we anonymise the user_id while keeping the event.
  • Export: daily cold-storage copy to GCS for SLA with the DPA.

Full detail: Audit trail internals.

AttackerPrimary defenceSecondary
Scanner hitting .run.app URL directlyX-Origin-Verify HMACRate limiter
Leaked customer tp_ keyHashed storage + rotation on self-servePer-agent scoping — one key ≠ org takeover
Compromised session cookieLogin-rotation of session JWTSameSite + Secure + HttpOnly
Rogue SDK flooding /hooksRate limiter (Redis)Per-agent cost envelope
Insider with DB readRow-level crypto under KMS DEKNo plaintext LLM keys in DB
Vendor breach (Sentry, PostHog)send_default_pii=False; no secrets in eventsScrub middleware (_scrub_event)

We deliberately do not defend against a fully-compromised GCP project — if an attacker has GCP Organization Admin, the game is over and our DPA recovery SLA kicks in.

Do

  • Land new secrets in Secret Manager via terraform, with a corresponding secretKeyRef in cloudrun.tf.
  • Emit an AuditEvent for every new decision path.
  • Fail closed by default; make fail-open explicit with a comment that justifies it.

Don't

  • Log secrets or PII (the _scrub_event hook catches common cases but it's not a safety net for sloppy logger.info).
  • Add middleware that bypasses AuthMiddleware — extend it or add a new allow-list entry.
  • Cache credentials. The vault is the source of truth, every call.
  • Introduce a new identity primitive. If tp_, session JWT, or SPIFFE doesn't fit the use case, escalate before writing code.