Intent → Policy: a smart, declarative mapping (no if/else trees)
Intent → Policy: a smart, declarative mapping (no if/else trees)
Section titled “Intent → Policy: a smart, declarative mapping (no if/else trees)”The problem
Section titled “The problem”The pipeline-creation Step 2 asks the operator a high-level question:
What data does it handle? ☐ Customer PII ☐ Payment data ☐ Source code & secrets ☐ Internal docs only ☐ External communications ☐ Health data ☐ EU residents
We need to translate those checkboxes into:
- Pipeline steps (which detectors to enable, in what mode)
- Tool constraints (per-tool parameter rules)
- OPA Rego policies (instantiated from templates)
A naïve if pii: enable_pii_detector chain doesn't scale: 7 categories × N regulations × M tools × outcomes is combinatorial. Hardcoded mappings calcify the moment a new category, regulation, or tool lands.
The pattern: two layers of indirection
Section titled “The pattern: two layers of indirection”Categories → Concerns → Mitigations(UI labels) (risk/reg model) (steps + constraints + Rego)"Customer PII" data_leak detect_pii (block)"Payment data" pci_dss detect_secrets (block)"EU residents" gdpr_required tool: send_email exclude *.us audit_required scan_output (block) … …Why two layers, not one direct map:
-
Dedup is automatic. Customer PII and Health Data both trigger
data_leak. With direct mapping,detect_piiwould be enabled twice (once per category) and we'd need conflict-resolution logic in the UI. With concerns as aset, dedup falls out for free. -
Categories vs. Concerns vs. Mitigations evolve at different rates.
- Categories change when product UX redesigns (every 6 months)
- Concerns change when regulations land (HIPAA, DORA, EU AI Act — every few years)
- Mitigations change when detectors / tools / policies ship (every sprint)
- One layer per rate-of-change.
-
Composability. New category → one entry pointing at existing concerns. New regulation → one concerns entry pointing at mitigations. New mitigation → registered once, every concern that needs it picks it up.
-
The NLP-interview future works without modification (see § "Forward: NLP → policy" below).
The catalog shape
Section titled “The catalog shape”Two YAML files, hand-curated by domain experts (compliance + security):
tappass/policy/intent_catalog.yaml
Section titled “tappass/policy/intent_catalog.yaml”categories: customer_pii: label: "Customer PII" hint: "Names, emails, addresses, phone numbers" triggers: [data_leak, audit_required]
payment_data: label: "Payment data" hint: "Card numbers, IBANs, transaction IDs" triggers: [pci_dss, fraud_risk, data_leak, audit_required]
source_code_secrets: label: "Source code & secrets" hint: "Code snippets, API keys, credentials" triggers: [secrets_leak, code_exec_risk, ip_protection]
internal_docs_only: label: "Internal docs only" hint: "No customer or regulated data" triggers: [] # No concerns; lowers the floor (see § "Conflict resolution")
external_comms: label: "External communications" hint: "Sends email / messages outside the org" triggers: [exfiltration_risk, recipient_validation, audit_required]
health_data: label: "Health data" hint: "Medical records, diagnoses, test results" triggers: [hipaa, data_leak, access_control_strict, audit_required]
eu_residents: label: "EU residents" hint: "GDPR applies; EU data residency required" triggers: [gdpr_required, data_residency]tappass/policy/concerns.yaml
Section titled “tappass/policy/concerns.yaml”concerns: data_leak: summary: "Sensitive data leaving the org via any channel" pipeline_steps: detect_pii: { enabled: true, on_detection: block } scan_output: { enabled: true, on_detection: block } tool_constraints: {} rego_templates: - block_tool_when_pii_detected: target_tool: send_email
secrets_leak: summary: "API keys / credentials in agent input or output" pipeline_steps: detect_secrets: { enabled: true, on_detection: block } tool_constraints: Bash: command: { not_contains: ["~/.aws", "~/.ssh/id_", "AWS_SECRET"] } rego_templates: []
pci_dss: summary: "PCI-DSS scope: card numbers, transaction IDs" pipeline_steps: detect_pii: { enabled: true, on_detection: block } # additive — dedups detect_secrets: { enabled: true, on_detection: block } classify_data: { enabled: true } tool_constraints: {} rego_templates: []
hipaa: summary: "Protected Health Information" pipeline_steps: detect_pii: { enabled: true, on_detection: block } # phi reuses pii detector for now classify_data: { enabled: true } tool_constraints: send_email: to: { exclude_pattern: "^(?!.*@(acme|hospital)\\.(com|org)).*" } rego_templates: []
gdpr_required: summary: "EU data subjects in scope" pipeline_steps: detect_pii: { enabled: true, on_detection: block } classify_data: { enabled: true } tool_constraints: send_email: to: { exclude: ["*@*.us", "*@*.cn"] } rego_templates: []
data_residency: summary: "Data must remain in a specified region" pipeline_steps: {} tool_constraints: {} rego_templates: - block_egress_outside_region: allowed_regions: [eu]
exfiltration_risk: summary: "External destinations reachable via tools" pipeline_steps: detect_exfiltration: { enabled: true, on_detection: block } tool_constraints: {} rego_templates: []
recipient_validation: summary: "Validate destination domains for outbound messaging" pipeline_steps: {} tool_constraints: send_email: to: { match: "^[^@]+@(allowed-domain-1|allowed-domain-2)\\." } rego_templates: []
code_exec_risk: summary: "Tool calls that execute code" pipeline_steps: detect_code_exec: { enabled: true, on_detection: block } tool_constraints: Bash: command: { not_contains: ["rm -rf", "sudo", "eval $", "curl | sh"] } rego_templates: []
audit_required: summary: "Regulatory audit trail required" pipeline_steps: audit_signing: { enabled: true } tool_constraints: {} rego_templates: []
access_control_strict: summary: "Step-up auth / approval for sensitive actions" pipeline_steps: require_approval: { enabled: true, on_detection: block } tool_constraints: {} rego_templates: []
fraud_risk: summary: "High-value transaction patterns" pipeline_steps: detect_anomaly: { enabled: true, on_detection: notify } tool_constraints: transfer_funds: amount: { max: 10000 } rego_templates: []
ip_protection: summary: "Source code / IP must not leak externally" pipeline_steps: detect_secrets: { enabled: true, on_detection: block } tool_constraints: Read: file_path: { not_contains: ["/repo/", ".env"] } rego_templates: []The catalog is data, not code. A compliance officer with no Python knowledge can add a category or concern by editing YAML.
The resolver (one function, ~40 lines)
Section titled “The resolver (one function, ~40 lines)”from dataclasses import dataclass, fieldfrom typing import Any
@dataclassclass ResolvedPolicy: pipeline_steps: dict[str, dict] = field(default_factory=dict) tool_constraints: dict[str, dict] = field(default_factory=dict) rego_template_instances: list[dict] = field(default_factory=list) # Audit trail: which categories / concerns produced each line. provenance: dict[str, list[str]] = field(default_factory=dict)
def resolve(category_ids: list[str], catalog: Catalog) -> ResolvedPolicy: """Categories → concerns → mitigations, deduplicated and merged.""" concerns: set[str] = set() cat_to_concerns: dict[str, list[str]] = {} for cid in category_ids: cat = catalog.categories[cid] cat_to_concerns[cid] = cat.triggers concerns.update(cat.triggers)
out = ResolvedPolicy() for concern_id in concerns: c = catalog.concerns[concern_id]
# Pipeline steps — strictest wins (block > log; lower threshold wins). for step, cfg in c.pipeline_steps.items(): existing = out.pipeline_steps.get(step, {}) out.pipeline_steps[step] = _merge_step_cfg(existing, cfg) out.provenance.setdefault(f"step:{step}", []).append(concern_id)
# Tool constraints — union per (tool, param). for tool, rules in c.tool_constraints.items(): existing = out.tool_constraints.get(tool, {}) out.tool_constraints[tool] = _merge_constraint(existing, rules) out.provenance.setdefault(f"tool:{tool}", []).append(concern_id)
# Rego templates — instantiate each, dedup by (template_id, params). for tmpl in c.rego_templates: if tmpl not in out.rego_template_instances: out.rego_template_instances.append(tmpl) out.provenance.setdefault( f"rego:{tmpl.template_id}", [] ).append(concern_id)
return outThat's the entire engine. Forty lines. The catalog does the heavy lifting; the resolver is a fold.
Conflict resolution rules (built into _merge_*)
Section titled “Conflict resolution rules (built into _merge_*)”When two concerns disagree on a step or constraint, the stricter wins. No exceptions, no operator override at this layer (operator can override after by editing the resulting pipeline directly — that's a separate, audited action).
| Field | Strict-wins rule |
|---|---|
enabled | True wins over False |
on_detection | block > notify > log |
max (numeric ceiling) | lower wins |
min (numeric floor) | higher wins |
contains (must contain) | union (logical AND across all sources) |
not_contains (must not contain) | union |
exclude / not_match | union |
This rule is the opinionated part. Documented prominently so operators understand "ticking 7 boxes never relaxes anything."
The "Internal docs only" exception
Section titled “The "Internal docs only" exception”This is the one negative assertion in the catalog. It triggers no concerns — meaning a pipeline with only internal_docs_only checked gets the slim proxy with no detection enabled.
But the moment the operator also ticks customer_pii (or anything else), the negative is overridden because concerns are additive and strictness wins. So internal_docs_only is effectively a "lower the floor" signal, never an override.
Operator-facing explainability
Section titled “Operator-facing explainability”Once resolve() returns a ResolvedPolicy, the UI shows a preview before commit:
Based on what you ticked, we'll enable:
✓ detect_pii (block) Because: customer_pii, payment_data, eu_residents, health_data ✓ detect_secrets (block) Because: source_code_secrets, payment_data ✓ detect_exfiltration (block) Because: external_comms ✓ scan_output (block) Because: customer_pii …
✓ Tool constraint: Bash.command not_contains: ["rm -rf", "sudo", "eval $"] Because: source_code_secrets
✓ OPA template: block_egress_outside_region (allowed: [eu]) Because: eu_residents
13 steps · 6 tool constraints · 2 OPA policiesEvery line has a because trail. This solves the "why is this step here?" question that operators ask three months later when they want to relax something.
Forward: NLP → policy (the interview)
Section titled “Forward: NLP → policy (the interview)”The same resolver runs. The LLM never sees Rego, never sees step names. It produces a category set:
Operator says: "We're a Belgian fintech, our agents send transactionalemails to EU customers about wire transfers."
LLM extracts (via tool call to a small classifier): - geography: EU (Belgium) - industry: fintech - data: payment, customer PII - tools used: outbound email - regulated: yes (PCI, GDPR)
LLM proposes categories with confidence scores: - eu_residents (confident: explicit "EU customers") - customer_pii (high: customers + emails) - payment_data (high: "wire transfers") - external_comms (high: "send emails") - source_code_secrets (low: not mentioned, leave unticked)
UI shows the proposal — operator confirms or flips boxes.Same resolve() runs. Same ResolvedPolicy lands.Why this composes:
- The catalog stays declarative and human-curated
- The LLM is a typed-selection helper from prose → categories (a small, well-bounded task)
- No LLM-generated Rego ever runs (huge safety win — Rego from natural language is dangerous)
- Confidence scores let the UI distinguish "we read this directly" from "we inferred this"
The interview can iterate:
"I see you ticked health data — are EU residents in scope?" "Will any of these tools post to webhooks or external APIs?"
Each question fills a category cell. Three to five questions usually cover most pipelines.
Catalog as knowledge graph: a follow-up "explain this rule" view can walk backwards — show "detect_pii is enabled because of these concerns, which were triggered by these categories, which the LLM derived from these phrases in the interview." Every decision is traceable to a phrase the operator said.
What ships in the smallest viable cut
Section titled “What ships in the smallest viable cut”Two files + one route + one UI panel:
tappass/policy/intent_catalog.yaml(categories — 7 entries)tappass/policy/concerns.yaml(concerns — 12 entries)tappass/policy/intent_resolver.py(the 40-line resolve function + merge helpers + a Catalog loader)- Extend
OnboardAgentRequest(and / or PipelineCreate) withcategories: list[str]. On submit, runresolve(), write the resulting steps + constraints + Rego templates into the per-tenant pipeline + emit onepolicy_intent_appliedaudit event with the full provenance map - Frontend Step 3: render the preview from the resolver's response
The NLP layer is a phase-2: a separate /agents/onboard/from-prose endpoint that takes a string, calls a classifier, returns the proposed category set. Same resolver runs after operator confirms.
What this concept does NOT prescribe
Section titled “What this concept does NOT prescribe”- Specific catalog values — the YAML above is illustrative. Compliance/security folks will tune them.
- The Rego template implementations — those live in
tappass/policy/templates/builtins.py(already shipped: 3 templates; needs ~5 more to cover the catalog above). - The classifier model — could be a small fine-tuned local model, an LLM with structured output, or a rules-based parser. Decoupled from this design.
- Whether categories are global or per-org — start global; per-org overrides are a phase-3.
Questions to resolve before we build
Section titled “Questions to resolve before we build”- Who owns catalog edits? A compliance role in the dashboard, or YAML in source control? (Recommend: YAML in source for v1; admin UI in v2 with audit on every catalog edit.)
- Do operator overrides get persisted as catalog deltas, or as direct pipeline edits? (Recommend: direct pipeline edits — the catalog stays clean, the operator's override is visible in the pipeline diff.)
- What happens when the catalog changes after agents are onboarded? (Recommend: do nothing automatically — surface a "your pipeline diverges from the current catalog defaults" advisory in the dashboard. Re-applying is an explicit, audited action.)