Key flows
This page is the "show me a real trace" counterpart to Domain objects. Three flows that cover 95% of what the platform actually does.
Each flow has a Mermaid sequence diagram, a step-by-step narrative, and the objects as they look at each stage.
Flow 1 — Life of a governed LLM call
Section titled “Flow 1 — Life of a governed LLM call”The hot path. The thing the platform exists to do.
Starting point: a customer's agent wants to call Anthropic Claude.
It has a tp_dev_<key> issued at onboard time.
sequenceDiagram
autonumber
participant Agent as Customer agent
participant CF as Cloudflare
participant CR as Cloud Run tappass
participant OPA as OPA sidecar
participant DB as Cloud SQL
participant V as Vault
participant LLM as Anthropic API
Agent->>CF: POST /v1/messages with Bearer tp_dev
CF->>CR: adds X-Origin-Verify HMAC
CR->>DB: lookup Agent by tp_ hash
CR->>OPA: authz agent and call_llm action
OPA-->>CR: allow
CR->>CR: build PipelineContext
loop 49 pipeline steps
CR->>CR: step.execute ctx produces detections
end
CR->>OPA: evaluate detections
OPA-->>CR: Decision outcome allow, mandate JWS
CR->>V: decrypt provider key for org
V->>DB: read vault_llm_keys ciphertext
V-->>CR: plaintext Claude key
CR->>LLM: POST /v1/messages with Bearer sk-ant
LLM-->>CR: stream tokens
CR->>DB: write AuditEvent, hash-chained + Ed25519 signed
CR-->>CF: stream tokens back
CF-->>Agent: stream tokens back
What each object looks like at each stage
Section titled “What each object looks like at each stage”After step 3 (agent lookup):
ctx.agent = Agent( agent_id="claude-code-jens", agent_uuid="ag_N4M1FG3_", org_id="tappass-6ab653", framework="claude-code", active=True, metadata={"public_key": "…", "intended_use": "engineering"},)After the pipeline loop (around step 9):
ctx.detections = [ Detection(category="pii.email", severity="low", label="email_address", score=0.94, text="jens.bontinck@tappass.ai", backend="llm_guard"), # …]ctx.tokens_consumed = 1_240 # running counterctx.cost_cents = 8After OPA evaluation:
Decision( outcome="allow", reason="policy 'dev_team' permits email exposure for this agent", behavior_id="call_llm", pipeline_id="agent-claude-code-jens", mandate="eyJhbGciOi…", # JWS, decodes to a Mandate)The embedded Mandate:
Mandate( subject="spiffe://tappass-6ab653.tappass.eu/agent/claude-code-jens", issuer="tappass-prod", capabilities=( MandateCapability(action="call_llm", resource="anthropic", qualifier="claude-opus-4-7"), ), mandate_id="mnd_7b4a…", issued_at=…, expires_at=…, # typically 60s)After step 13 (audit write):
AuditEvent( event_id="ae_9f3c…", timestamp=…, org_id="tappass-6ab653", event_type="llm_call_completed", agent_id="claude-code-jens", session_id="sess_1Q2w…", details={ "provider": "anthropic", "model": "claude-opus-4-7", "input_tokens": 1240, "output_tokens": 880, "cost_cents": 14, "decision": "allow", "detections": [...], "mandate_id": "mnd_7b4a…", }, # Internal — not API-visible prev_hash=b"\\x8f...", hash=b"\\xa2...", signature=b"\\x7e...",)Where it breaks
Section titled “Where it breaks”| Step | If it fails | Symptom |
|---|---|---|
| 1–2 | X-Origin-Verify mismatch | 401 from AuthMiddleware |
| 3 | Agent not found / suspended | 403 with agent_inactive |
| 4 | OPA unreachable (>500ms) | 500 + opa_authz_unavailable_denied (fail closed) |
| 6–7 | Step raises | Decision=block, reason = exception class |
| 10 | Vault decrypt fails | 503 to agent; no retry — rotation or KMS outage |
| 11 | Provider 5xx | Circuit breaker opens for this provider; 502 with passthrough |
| 13 | Audit write fails | Retries (audit is the one place we don't fail-silent) |
Flow 2 — Life of a policy change
Section titled “Flow 2 — Life of a policy change”An operator realises their agent is leaking AWS keys. They want to block the pattern end-to-end in under a minute.
Starting point: operator is signed into the dashboard, has
ORG_ADMIN role.
sequenceDiagram autonumber participant UI as Dashboard participant API as Backend control plane participant Store as Policy store in Postgres participant OPA as OPA sidecar participant Cache as Per-worker TTL cache UI->>API: GET /pipelines/constraints/catalog API->>Cache: cached? Cache-->>API: hit, 60s TTL API-->>UI: list of constraints UI->>API: POST /pipelines/constraints/compile with ids block_aws_keys API-->>UI: compiled pipeline preview UI->>API: PUT /pipelines/id with new pipeline state body API->>Store: write new Pipeline + audit API->>OPA: reload data for new Pipeline API->>Cache: invalidate known-tools and presets API-->>UI: 200 Note over UI,OPA: Next request hits the new pipeline within ~1s. OPA reload is hot.
What changes where
Section titled “What changes where”In Postgres — pipelines row is updated:
UPDATE pipelines SET categories = jsonb_set(categories, '{call,route_and_execute,steps,detect_secrets}', '{}'), version = version + 1, updated_at = NOW()WHERE id = 'pip_…' AND org_id = '…';
INSERT INTO audit_events (event_type, details, …)VALUES ('pipeline_updated', '{"agent_id": "…", "added_steps": ["detect_secrets"], "changed_by": "operator@customer.com"}', …);In OPA — the policy data document for this org gets a new
pipelines.<pipeline_id>.categories.call.route_and_execute.steps.detect_secrets
entry. OPA reloads in <50ms (no recompile — just a data-document
refresh).
In the TTL cache — known-tools, presets, constraints/catalog
are invalidated so the next dashboard fetch reflects the new shape.
Why it's fast end-to-end
Section titled “Why it's fast end-to-end”| Link | Cost |
|---|---|
| Constraint catalog load | <5ms (cached) |
| Compile preview | ~10ms (pure function; no DB) |
| Policy write + audit | ~80ms (2 DB writes in a transaction) |
| OPA reload | ~50ms |
| Cache invalidation | <1ms (in-process dict) |
Operator round-trip is under a second from "click save" to "next request uses new policy."
Flow 3 — Life of an audit export
Section titled “Flow 3 — Life of an audit export”A compliance officer at a customer needs to ship the last 30 days of policy decisions to their SIEM for quarterly review.
Starting point: the customer has an Enterprise plan and a configured Splunk HEC endpoint. Their workspace is set up for scheduled exports.
sequenceDiagram
autonumber
participant Sched as Scheduler cron
participant API as Backend
participant DB as Cloud SQL
participant GCS as GCS cold archive
participant Hook as Export worker
participant SIEM as Customer Splunk HEC
Sched->>API: trigger export for org over 30d
API->>DB: SELECT from audit_events by org_id + since
DB-->>API: stream events
API->>API: verify hash chain
alt Chain intact
API->>GCS: write org events jsonl archive
API->>Hook: enqueue events chunk
Hook->>SIEM: POST /services/collector in batches of 1000
SIEM-->>Hook: 200 OK
API-->>Sched: done
else Chain broken
API->>API: alert on integrity failure
API-->>Sched: export aborted
end
Why hash-chain verification runs first
Section titled “Why hash-chain verification runs first”Exports are legal evidence. If the chain has been tampered with, shipping it as-is to a customer SIEM would let an attacker's modified history become the record of truth downstream. We fail the export loudly instead.
The periodic integrity_check_worker (every 4 hours; see
observability/background.py) catches most tampering before an
export runs, but verify-on-export is a belt-and-braces.
What the customer actually sees
Section titled “What the customer actually sees”Each AuditEvent arrives at their SIEM as JSON, shape:
{ "event_id": "ae_9f3c…", "timestamp": "2026-04-23T14:22:01.044Z", "org_id": "acme", "event_type": "llm_call_completed", "agent_id": "support-bot", "session_id": "sess_…", "details": { "provider": "openai", "model": "gpt-4.1", "decision": "block", "reason": "policy_denied: pii.ssn", "detections": [ { "category": "pii.ssn", "severity": "high", "score": 0.99 } ] }, "hash_hex": "a2f3…", "signature_hex": "7e8c…"}The hash_hex + signature_hex are there so the customer can
re-verify chain integrity against our public audit-signing key.
GDPR carve-out
Section titled “GDPR carve-out”The export runs through the AuditEvent retention policy — records
flagged by a prior Art. 17 request have their user_id replaced
with the anonymised pseudonym. The event itself is retained (per
DPA) but no longer personally identifies the erased subject. See
Customer data export for the
full GDPR flow.
How these flows connect
Section titled “How these flows connect”flowchart LR Op[Operator] -->|configures| Pipeline Pipeline -->|drives| Flow1[Flow 1
LLM call] Flow1 -->|writes| AE[AuditEvent] AE -->|read by| Flow3[Flow 3
audit export] AE -->|read by| Dash[Dashboard insights] Op -->|changes policy| Flow2[Flow 2
policy change] Flow2 -->|updates| Pipeline
Flow 1 is the heartbeat. Flow 2 is the control loop. Flow 3 is the compliance-evidence loop. Every incident this week involved one of the three breaking at a specific link — the runbooks under Incident response map directly onto these diagrams.
Also see
Section titled “Also see”- Domain objects — the classes these flows operate on.
- Pipeline step anatomy — zoom into one box of Flow 1.
- Deployment architecture — the physical topology these flows run on.
- Codebase tour — read the code that implements Flow 1.