Key flows

This page is the "show me a real trace" counterpart to Domain objects. Three flows that cover 95% of what the platform actually does.

Each flow has a Mermaid sequence diagram, a step-by-step narrative, and the objects as they look at each stage.

Flow 1 — Life of a governed LLM call

The hot path. The thing the platform exists to do.

Starting point: a customer's agent wants to call Anthropic Claude. It has a tp_dev_<key> issued at onboard time.

sequenceDiagram
  autonumber
  participant Agent as Customer agent
  participant CF as Cloudflare
  participant CR as Cloud Run tappass
  participant OPA as OPA sidecar
  participant DB as Cloud SQL
  participant V as Vault
  participant LLM as Anthropic API

  Agent->>CF: POST /v1/messages with Bearer tp_dev
  CF->>CR: adds X-Origin-Verify HMAC
  CR->>DB: lookup Agent by tp_ hash
  CR->>OPA: authz agent and call_llm action
  OPA-->>CR: allow
  CR->>CR: build PipelineContext
  loop 49 pipeline steps
    CR->>CR: step.execute ctx produces detections
  end
  CR->>OPA: evaluate detections
  OPA-->>CR: Decision outcome allow, mandate JWS
  CR->>V: decrypt provider key for org
  V->>DB: read vault_llm_keys ciphertext
  V-->>CR: plaintext Claude key
  CR->>LLM: POST /v1/messages with Bearer sk-ant
  LLM-->>CR: stream tokens
  CR->>DB: write AuditEvent, hash-chained + Ed25519 signed
  CR-->>CF: stream tokens back
  CF-->>Agent: stream tokens back

What each object looks like at each stage

After step 3 (agent lookup):

ctx.agent = Agent(
    agent_id="claude-code-jens",
    agent_uuid="ag_N4M1FG3_",
    org_id="tappass-6ab653",
    framework="claude-code",
    active=True,
    metadata={"public_key": "…", "intended_use": "engineering"},
)

After the pipeline loop (around step 9):

ctx.detections = [
    Detection(category="pii.email",
              severity="low",
              label="email_address",
              score=0.94,
              text="jens.bontinck@tappass.ai",
              backend="llm_guard"),
    # …
]
ctx.tokens_consumed = 1_240  # running counter
ctx.cost_cents = 8

After OPA evaluation:

Decision(
    outcome="allow",
    reason="policy 'dev_team' permits email exposure for this agent",
    behavior_id="call_llm",
    pipeline_id="agent-claude-code-jens",
    mandate="eyJhbGciOi…",  # JWS, decodes to a Mandate
)

The embedded Mandate:

Mandate(
    subject="spiffe://tappass-6ab653.tappass.eu/agent/claude-code-jens",
    issuer="tappass-prod",
    capabilities=(
        MandateCapability(action="call_llm",
                          resource="anthropic",
                          qualifier="claude-opus-4-7"),
    ),
    mandate_id="mnd_7b4a…",
    issued_at=…,
    expires_at=…,  # typically 60s
)

After step 13 (audit write):

AuditEvent(
    event_id="ae_9f3c…",
    timestamp=…,
    org_id="tappass-6ab653",
    event_type="llm_call_completed",
    agent_id="claude-code-jens",
    session_id="sess_1Q2w…",
    details={
        "provider": "anthropic",
        "model": "claude-opus-4-7",
        "input_tokens": 1240,
        "output_tokens": 880,
        "cost_cents": 14,
        "decision": "allow",
        "detections": [...],
        "mandate_id": "mnd_7b4a…",
    },
    # Internal — not API-visible
    prev_hash=b"\\x8f...",
    hash=b"\\xa2...",
    signature=b"\\x7e...",
)

Where it breaks

Step	If it fails	Symptom
1–2	X-Origin-Verify mismatch	401 from AuthMiddleware
3	Agent not found / suspended	403 with `agent_inactive`
4	OPA unreachable (>500ms)	500 + `opa_authz_unavailable_denied` (fail closed)
6–7	Step raises	Decision=`block`, `reason` = exception class
10	Vault decrypt fails	503 to agent; no retry — rotation or KMS outage
11	Provider 5xx	Circuit breaker opens for this provider; 502 with passthrough
13	Audit write fails	Retries (audit is the one place we don't fail-silent)

Flow 2 — Life of a policy change

An operator realises their agent is leaking AWS keys. They want to block the pattern end-to-end in under a minute.

Starting point: operator is signed into the dashboard, has ORG_ADMIN role.

sequenceDiagram
  autonumber
  participant UI as Dashboard
  participant API as Backend control plane
  participant Store as Policy store in Postgres
  participant OPA as OPA sidecar
  participant Cache as Per-worker TTL cache

  UI->>API: GET /pipelines/constraints/catalog
  API->>Cache: cached?
  Cache-->>API: hit, 60s TTL
  API-->>UI: list of constraints

  UI->>API: POST /pipelines/constraints/compile with ids block_aws_keys
  API-->>UI: compiled pipeline preview

  UI->>API: PUT /pipelines/id with new pipeline state body
  API->>Store: write new Pipeline + audit
  API->>OPA: reload data for new Pipeline
  API->>Cache: invalidate known-tools and presets
  API-->>UI: 200

  Note over UI,OPA: Next request hits the new pipeline within ~1s. OPA reload is hot.

What changes where

In Postgres — pipelines row is updated:

UPDATE pipelines
  SET categories = jsonb_set(categories, '{call,route_and_execute,steps,detect_secrets}', '{}'),
      version = version + 1,
      updated_at = NOW()
WHERE id = 'pip_…' AND org_id = '…';

INSERT INTO audit_events (event_type, details, …)
VALUES ('pipeline_updated',
        '{"agent_id": "…", "added_steps": ["detect_secrets"], "changed_by": "operator@customer.com"}',
        …);

In OPA — the policy data document for this org gets a new pipelines.<pipeline_id>.categories.call.route_and_execute.steps.detect_secrets entry. OPA reloads in <50ms (no recompile — just a data-document refresh).

In the TTL cache — known-tools, presets, constraints/catalog are invalidated so the next dashboard fetch reflects the new shape.

Why it's fast end-to-end

Link	Cost
Constraint catalog load	<5ms (cached)
Compile preview	~10ms (pure function; no DB)
Policy write + audit	~80ms (2 DB writes in a transaction)
OPA reload	~50ms
Cache invalidation	<1ms (in-process dict)

Operator round-trip is under a second from "click save" to "next request uses new policy."

Flow 3 — Life of an audit export

A compliance officer at a customer needs to ship the last 30 days of policy decisions to their SIEM for quarterly review.

Starting point: the customer has an Enterprise plan and a configured Splunk HEC endpoint. Their workspace is set up for scheduled exports.

sequenceDiagram
  autonumber
  participant Sched as Scheduler cron
  participant API as Backend
  participant DB as Cloud SQL
  participant GCS as GCS cold archive
  participant Hook as Export worker
  participant SIEM as Customer Splunk HEC

  Sched->>API: trigger export for org over 30d
  API->>DB: SELECT from audit_events by org_id + since
  DB-->>API: stream events
  API->>API: verify hash chain
  alt Chain intact
    API->>GCS: write org events jsonl archive
    API->>Hook: enqueue events chunk
    Hook->>SIEM: POST /services/collector in batches of 1000
    SIEM-->>Hook: 200 OK
    API-->>Sched: done
  else Chain broken
    API->>API: alert on integrity failure
    API-->>Sched: export aborted
  end

Why hash-chain verification runs first

Exports are legal evidence. If the chain has been tampered with, shipping it as-is to a customer SIEM would let an attacker's modified history become the record of truth downstream. We fail the export loudly instead.

The periodic integrity_check_worker (every 4 hours; see observability/background.py) catches most tampering before an export runs, but verify-on-export is a belt-and-braces.

What the customer actually sees

Each AuditEvent arrives at their SIEM as JSON, shape:

{
  "event_id": "ae_9f3c…",
  "timestamp": "2026-04-23T14:22:01.044Z",
  "org_id": "acme",
  "event_type": "llm_call_completed",
  "agent_id": "support-bot",
  "session_id": "sess_…",
  "details": {
    "provider": "openai",
    "model": "gpt-4.1",
    "decision": "block",
    "reason": "policy_denied: pii.ssn",
    "detections": [
      { "category": "pii.ssn", "severity": "high", "score": 0.99 }
    ]
  },
  "hash_hex": "a2f3…",
  "signature_hex": "7e8c…"
}

The hash_hex + signature_hex are there so the customer can re-verify chain integrity against our public audit-signing key.

The export runs through the AuditEvent retention policy — records flagged by a prior Art. 17 request have their user_id replaced with the anonymised pseudonym. The event itself is retained (per DPA) but no longer personally identifies the erased subject. See Customer data export for the full GDPR flow.

How these flows connect

flowchart LR
  Op[Operator] -->|configures| Pipeline
  Pipeline -->|drives| Flow1[Flow 1
LLM call]
  Flow1 -->|writes| AE[AuditEvent]
  AE -->|read by| Flow3[Flow 3
audit export]
  AE -->|read by| Dash[Dashboard insights]
  Op -->|changes policy| Flow2[Flow 2
policy change]
  Flow2 -->|updates| Pipeline

Flow 1 is the heartbeat. Flow 2 is the control loop. Flow 3 is the compliance-evidence loop. Every incident this week involved one of the three breaking at a specific link — the runbooks under Incident response map directly onto these diagrams.

Also see

Domain objects — the classes these flows operate on.
Pipeline step anatomy — zoom into one box of Flow 1.
Deployment architecture — the physical topology these flows run on.
Codebase tour — read the code that implements Flow 1.