Skip to content

Data model

Postgres 15+ via Cloud SQL in prod. Alembic for migrations. Every table has created_at, updated_at, and a UUIDv7 primary key (time-ordered — helps indexing).

TablePurpose
tenantsOne row per customer org
usersHumans with SSO identity
agentsRegistered agent identities (one tp_ key maps to one agent)
api_keysHash-stored tp_ keys; one row per issued key
sessionsBrowser sessions for SSO-authenticated humans
oauth_identitiesLinks users to their IdP subject claims
capability_tokensIssued capability tokens (metadata; signature not stored)

The key rule: everything in the data path is scoped by tenant_id at the top of the request. Cross-tenant reads are a bug.

TablePurpose
policy_profilesNamed config blobs (alpha / beta / default / per-customer)
step_configsPer-tenant overrides for individual steps
provider_configsPer-tenant LLM provider bindings (model → provider mapping)
vault_secretsEncrypted provider keys (the “vault backend: postgres” storage)
TablePurpose
audit_eventsThe append-only, hash-chained, signed event stream
audit_checkpointsSigned checkpoint every N events
audit_archivesPointer rows for GCS cold-storage dumps

audit_events is the single most important table. See Audit trail internals.

TablePurpose
export_configsPer-tenant SIEM / webhook destinations
export_deliveriesDelivery attempts (for DLQ + retry)
TablePurpose
agent_health_snapshotsDaily rollup per agent — score, grade, dimensions
trust_eventsFine-grained signals feeding the score

Some fields are intentionally denormalised for query speed. Don’t “fix” them:

  • audit_events.agent_label — copied from agents.label so an agent rename doesn’t rewrite history
  • audit_events.model — copied because provider_configs can be updated live
  • agent_health_snapshots.score — computed, not a view, so Grafana queries stay fast

alembic/versions/ — one file per migration, numbered + slug.

Rules:

  1. Forward-only. We don’t use downgrade() in prod. Writing one is fine for local testing; never rely on it.
  2. Zero-downtime. Two-phase pattern for any non-additive change. See Database & migrations.
  3. Never drop a column in the same migration that stops writing to it. Add → stop-write → remove, three releases.
  4. Index creation uses CREATE INDEX CONCURRENTLY. Alembic supports it via op.execute (not op.create_index, which locks).
  • audit_events (tenant_id, ts DESC) — powers the dashboard
  • audit_events (agent_id, ts DESC) — powers health scoring
  • audit_events (event_kind, ts DESC) WHERE severity >= 'warn' — partial index for alerts
  • api_keys (key_hash) — lookup on authentication hot path
  • capability_tokens (holder_public_key) — capability verification

These intentionally don’t live in the main DB:

  • Secrets. Vault table rows hold ciphertext; keys live in Google Secret Manager.
  • Archived audit. > 90 days goes to GCS (gs://tappass-audit-archive/).
  • Metrics. Live in Prometheus / Grafana Cloud, not DB.
  • Session tokens for customer agents. Agents are stateless; no server-side session.

Get an agent’s last 50 blocks:

SELECT audit_id, ts, detections, policy_result
FROM audit_events
WHERE tenant_id = :tenant
AND agent_id = :agent
AND policy_result->>'verdict' = 'deny'
ORDER BY ts DESC
LIMIT 50;

Daily cost per tenant last 7 days:

SELECT date_trunc('day', ts) AS day,
SUM((cost->>'usd')::numeric) AS usd
FROM audit_events
WHERE tenant_id = :tenant
AND ts > now() - interval '7 days'
GROUP BY 1
ORDER BY 1;

PII detections by category last 30 days:

SELECT jsonb_array_elements(detections)->>'category' AS category,
COUNT(*)
FROM audit_events
WHERE tenant_id = :tenant
AND ts > now() - interval '30 days'
AND detections IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC;