Skip to content

Intent → Policy: a smart, declarative mapping (no if/else trees)

Intent → Policy: a smart, declarative mapping (no if/else trees)

Section titled “Intent → Policy: a smart, declarative mapping (no if/else trees)”

The pipeline-creation Step 2 asks the operator a high-level question:

What data does it handle? ☐ Customer PII ☐ Payment data ☐ Source code & secrets ☐ Internal docs only ☐ External communications ☐ Health data ☐ EU residents

We need to translate those checkboxes into:

  • Pipeline steps (which detectors to enable, in what mode)
  • Tool constraints (per-tool parameter rules)
  • OPA Rego policies (instantiated from templates)

A naïve if pii: enable_pii_detector chain doesn't scale: 7 categories × N regulations × M tools × outcomes is combinatorial. Hardcoded mappings calcify the moment a new category, regulation, or tool lands.

Categories → Concerns → Mitigations
(UI labels) (risk/reg model) (steps + constraints + Rego)
"Customer PII" data_leak detect_pii (block)
"Payment data" pci_dss detect_secrets (block)
"EU residents" gdpr_required tool: send_email exclude *.us
audit_required scan_output (block)
… …

Why two layers, not one direct map:

  1. Dedup is automatic. Customer PII and Health Data both trigger data_leak. With direct mapping, detect_pii would be enabled twice (once per category) and we'd need conflict-resolution logic in the UI. With concerns as a set, dedup falls out for free.

  2. Categories vs. Concerns vs. Mitigations evolve at different rates.

    • Categories change when product UX redesigns (every 6 months)
    • Concerns change when regulations land (HIPAA, DORA, EU AI Act — every few years)
    • Mitigations change when detectors / tools / policies ship (every sprint)
    • One layer per rate-of-change.
  3. Composability. New category → one entry pointing at existing concerns. New regulation → one concerns entry pointing at mitigations. New mitigation → registered once, every concern that needs it picks it up.

  4. The NLP-interview future works without modification (see § "Forward: NLP → policy" below).

Two YAML files, hand-curated by domain experts (compliance + security):

categories:
customer_pii:
label: "Customer PII"
hint: "Names, emails, addresses, phone numbers"
triggers: [data_leak, audit_required]
payment_data:
label: "Payment data"
hint: "Card numbers, IBANs, transaction IDs"
triggers: [pci_dss, fraud_risk, data_leak, audit_required]
source_code_secrets:
label: "Source code & secrets"
hint: "Code snippets, API keys, credentials"
triggers: [secrets_leak, code_exec_risk, ip_protection]
internal_docs_only:
label: "Internal docs only"
hint: "No customer or regulated data"
triggers: [] # No concerns; lowers the floor (see § "Conflict resolution")
external_comms:
label: "External communications"
hint: "Sends email / messages outside the org"
triggers: [exfiltration_risk, recipient_validation, audit_required]
health_data:
label: "Health data"
hint: "Medical records, diagnoses, test results"
triggers: [hipaa, data_leak, access_control_strict, audit_required]
eu_residents:
label: "EU residents"
hint: "GDPR applies; EU data residency required"
triggers: [gdpr_required, data_residency]
concerns:
data_leak:
summary: "Sensitive data leaving the org via any channel"
pipeline_steps:
detect_pii: { enabled: true, on_detection: block }
scan_output: { enabled: true, on_detection: block }
tool_constraints: {}
rego_templates:
- block_tool_when_pii_detected:
target_tool: send_email
secrets_leak:
summary: "API keys / credentials in agent input or output"
pipeline_steps:
detect_secrets: { enabled: true, on_detection: block }
tool_constraints:
Bash:
command: { not_contains: ["~/.aws", "~/.ssh/id_", "AWS_SECRET"] }
rego_templates: []
pci_dss:
summary: "PCI-DSS scope: card numbers, transaction IDs"
pipeline_steps:
detect_pii: { enabled: true, on_detection: block } # additive — dedups
detect_secrets: { enabled: true, on_detection: block }
classify_data: { enabled: true }
tool_constraints: {}
rego_templates: []
hipaa:
summary: "Protected Health Information"
pipeline_steps:
detect_pii: { enabled: true, on_detection: block } # phi reuses pii detector for now
classify_data: { enabled: true }
tool_constraints:
send_email:
to: { exclude_pattern: "^(?!.*@(acme|hospital)\\.(com|org)).*" }
rego_templates: []
gdpr_required:
summary: "EU data subjects in scope"
pipeline_steps:
detect_pii: { enabled: true, on_detection: block }
classify_data: { enabled: true }
tool_constraints:
send_email:
to: { exclude: ["*@*.us", "*@*.cn"] }
rego_templates: []
data_residency:
summary: "Data must remain in a specified region"
pipeline_steps: {}
tool_constraints: {}
rego_templates:
- block_egress_outside_region:
allowed_regions: [eu]
exfiltration_risk:
summary: "External destinations reachable via tools"
pipeline_steps:
detect_exfiltration: { enabled: true, on_detection: block }
tool_constraints: {}
rego_templates: []
recipient_validation:
summary: "Validate destination domains for outbound messaging"
pipeline_steps: {}
tool_constraints:
send_email:
to: { match: "^[^@]+@(allowed-domain-1|allowed-domain-2)\\." }
rego_templates: []
code_exec_risk:
summary: "Tool calls that execute code"
pipeline_steps:
detect_code_exec: { enabled: true, on_detection: block }
tool_constraints:
Bash:
command: { not_contains: ["rm -rf", "sudo", "eval $", "curl | sh"] }
rego_templates: []
audit_required:
summary: "Regulatory audit trail required"
pipeline_steps:
audit_signing: { enabled: true }
tool_constraints: {}
rego_templates: []
access_control_strict:
summary: "Step-up auth / approval for sensitive actions"
pipeline_steps:
require_approval: { enabled: true, on_detection: block }
tool_constraints: {}
rego_templates: []
fraud_risk:
summary: "High-value transaction patterns"
pipeline_steps:
detect_anomaly: { enabled: true, on_detection: notify }
tool_constraints:
transfer_funds:
amount: { max: 10000 }
rego_templates: []
ip_protection:
summary: "Source code / IP must not leak externally"
pipeline_steps:
detect_secrets: { enabled: true, on_detection: block }
tool_constraints:
Read:
file_path: { not_contains: ["/repo/", ".env"] }
rego_templates: []

The catalog is data, not code. A compliance officer with no Python knowledge can add a category or concern by editing YAML.

tappass/policy/intent_resolver.py
from dataclasses import dataclass, field
from typing import Any
@dataclass
class ResolvedPolicy:
pipeline_steps: dict[str, dict] = field(default_factory=dict)
tool_constraints: dict[str, dict] = field(default_factory=dict)
rego_template_instances: list[dict] = field(default_factory=list)
# Audit trail: which categories / concerns produced each line.
provenance: dict[str, list[str]] = field(default_factory=dict)
def resolve(category_ids: list[str], catalog: Catalog) -> ResolvedPolicy:
"""Categories → concerns → mitigations, deduplicated and merged."""
concerns: set[str] = set()
cat_to_concerns: dict[str, list[str]] = {}
for cid in category_ids:
cat = catalog.categories[cid]
cat_to_concerns[cid] = cat.triggers
concerns.update(cat.triggers)
out = ResolvedPolicy()
for concern_id in concerns:
c = catalog.concerns[concern_id]
# Pipeline steps — strictest wins (block > log; lower threshold wins).
for step, cfg in c.pipeline_steps.items():
existing = out.pipeline_steps.get(step, {})
out.pipeline_steps[step] = _merge_step_cfg(existing, cfg)
out.provenance.setdefault(f"step:{step}", []).append(concern_id)
# Tool constraints — union per (tool, param).
for tool, rules in c.tool_constraints.items():
existing = out.tool_constraints.get(tool, {})
out.tool_constraints[tool] = _merge_constraint(existing, rules)
out.provenance.setdefault(f"tool:{tool}", []).append(concern_id)
# Rego templates — instantiate each, dedup by (template_id, params).
for tmpl in c.rego_templates:
if tmpl not in out.rego_template_instances:
out.rego_template_instances.append(tmpl)
out.provenance.setdefault(
f"rego:{tmpl.template_id}", []
).append(concern_id)
return out

That's the entire engine. Forty lines. The catalog does the heavy lifting; the resolver is a fold.

Conflict resolution rules (built into _merge_*)

Section titled “Conflict resolution rules (built into _merge_*)”

When two concerns disagree on a step or constraint, the stricter wins. No exceptions, no operator override at this layer (operator can override after by editing the resulting pipeline directly — that's a separate, audited action).

FieldStrict-wins rule
enabledTrue wins over False
on_detectionblock > notify > log
max (numeric ceiling)lower wins
min (numeric floor)higher wins
contains (must contain)union (logical AND across all sources)
not_contains (must not contain)union
exclude / not_matchunion

This rule is the opinionated part. Documented prominently so operators understand "ticking 7 boxes never relaxes anything."

This is the one negative assertion in the catalog. It triggers no concerns — meaning a pipeline with only internal_docs_only checked gets the slim proxy with no detection enabled.

But the moment the operator also ticks customer_pii (or anything else), the negative is overridden because concerns are additive and strictness wins. So internal_docs_only is effectively a "lower the floor" signal, never an override.

Once resolve() returns a ResolvedPolicy, the UI shows a preview before commit:

Based on what you ticked, we'll enable:
✓ detect_pii (block)
Because: customer_pii, payment_data, eu_residents, health_data
✓ detect_secrets (block)
Because: source_code_secrets, payment_data
✓ detect_exfiltration (block)
Because: external_comms
✓ scan_output (block)
Because: customer_pii
✓ Tool constraint: Bash.command not_contains: ["rm -rf", "sudo", "eval $"]
Because: source_code_secrets
✓ OPA template: block_egress_outside_region (allowed: [eu])
Because: eu_residents
13 steps · 6 tool constraints · 2 OPA policies

Every line has a because trail. This solves the "why is this step here?" question that operators ask three months later when they want to relax something.

The same resolver runs. The LLM never sees Rego, never sees step names. It produces a category set:

Operator says: "We're a Belgian fintech, our agents send transactional
emails to EU customers about wire transfers."
LLM extracts (via tool call to a small classifier):
- geography: EU (Belgium)
- industry: fintech
- data: payment, customer PII
- tools used: outbound email
- regulated: yes (PCI, GDPR)
LLM proposes categories with confidence scores:
- eu_residents (confident: explicit "EU customers")
- customer_pii (high: customers + emails)
- payment_data (high: "wire transfers")
- external_comms (high: "send emails")
- source_code_secrets (low: not mentioned, leave unticked)
UI shows the proposal — operator confirms or flips boxes.
Same resolve() runs. Same ResolvedPolicy lands.

Why this composes:

  • The catalog stays declarative and human-curated
  • The LLM is a typed-selection helper from prose → categories (a small, well-bounded task)
  • No LLM-generated Rego ever runs (huge safety win — Rego from natural language is dangerous)
  • Confidence scores let the UI distinguish "we read this directly" from "we inferred this"

The interview can iterate:

"I see you ticked health data — are EU residents in scope?" "Will any of these tools post to webhooks or external APIs?"

Each question fills a category cell. Three to five questions usually cover most pipelines.

Catalog as knowledge graph: a follow-up "explain this rule" view can walk backwards — show "detect_pii is enabled because of these concerns, which were triggered by these categories, which the LLM derived from these phrases in the interview." Every decision is traceable to a phrase the operator said.

Two files + one route + one UI panel:

  1. tappass/policy/intent_catalog.yaml (categories — 7 entries)
  2. tappass/policy/concerns.yaml (concerns — 12 entries)
  3. tappass/policy/intent_resolver.py (the 40-line resolve function + merge helpers + a Catalog loader)
  4. Extend OnboardAgentRequest (and / or PipelineCreate) with categories: list[str]. On submit, run resolve(), write the resulting steps + constraints + Rego templates into the per-tenant pipeline + emit one policy_intent_applied audit event with the full provenance map
  5. Frontend Step 3: render the preview from the resolver's response

The NLP layer is a phase-2: a separate /agents/onboard/from-prose endpoint that takes a string, calls a classifier, returns the proposed category set. Same resolver runs after operator confirms.

  • Specific catalog values — the YAML above is illustrative. Compliance/security folks will tune them.
  • The Rego template implementations — those live in tappass/policy/templates/builtins.py (already shipped: 3 templates; needs ~5 more to cover the catalog above).
  • The classifier model — could be a small fine-tuned local model, an LLM with structured output, or a rules-based parser. Decoupled from this design.
  • Whether categories are global or per-org — start global; per-org overrides are a phase-3.
  1. Who owns catalog edits? A compliance role in the dashboard, or YAML in source control? (Recommend: YAML in source for v1; admin UI in v2 with audit on every catalog edit.)
  2. Do operator overrides get persisted as catalog deltas, or as direct pipeline edits? (Recommend: direct pipeline edits — the catalog stays clean, the operator's override is visible in the pipeline diff.)
  3. What happens when the catalog changes after agents are onboarded? (Recommend: do nothing automatically — surface a "your pipeline diverges from the current catalog defaults" advisory in the dashboard. Re-applying is an explicit, audited action.)