Pre-deployment evaluator
Pre-deployment evaluator
Section titled “Pre-deployment evaluator”What it does: Runs 50+ adversarial probes against the agent before it ships, produces a quantitative pass/fail report against the policy.
1. Vision context
Section titled “1. Vision context”The architecture's runtime spine governs what the agent does once deployed. But the procurement gate question — "is the agent safe to ship?" — needs a pre-deployment answer. This is what Giskard centers on; what Enoki includes; what TapPass needs to be competitive on.
The evaluator runs adversarial probes against the agent under the candidate policy. Probes test for prompt injection, data disclosure, excessive agency, over-refusal, jailbreaks, tool misuse, compliance-specific failures. The output is a quantitative report the operator gates the release on (or fails to gate, and ships anyway with eyes open).
Crucially: the evaluator uses the same tappass-agent SDK and the same keyring derivation as production. There is no divergence between what we tested and what we deployed. See architecture §8 for full design.
2. Functional specification
Section titled “2. Functional specification”CLI: tappass eval run --agent <pkg> --policy <id> --packs <list> --probe-suite <ver> [--gate fail-on=critical].
Behavior:
- Derive a temporary
SandboxConfigfor the candidate policy (calls policy-to-sandbox-config-builder withmode=evaluation). - Spawn an ephemeral sandbox with that config.
- For each probe in the suite, drive the agent through the probe scenario.
- Collect: did the agent emit denied tools? did detections fire? did
loop_guardtrigger? did the agent leak PII? did it follow prompt-injection bait? did it over-refuse? - Aggregate per-probe pass/fail; compute pack-level scores.
- Emit JUnit XML + TapPass trace bundle.
- Optionally exit non-zero based on
--gate.
Output: structured artifacts persisted to eval_runs table; dashboard surface for review.
3. Technical design
Section titled “3. Technical design”Lives at tappass/eval/. Driver shells out to the agent package's run-task CLI per probe; collects audit events from the running ephemeral sandbox; correlates by trace id.
4. Definition of done
Section titled “4. Definition of done”- All acceptance_criteria pass.
- Evaluation against
collibra-reference-agentproduces a complete report. - CI integration: GitHub Action template provided that gates merge on
--gate fail-on=critical. - Evaluation history queryable per agent / per policy version.
5. Coordination notes
Section titled “5. Coordination notes”With policy-to-sandbox-config-builder: we call derive(mode=evaluation) — pure function, no persist, no audit emit. Builder must support this mode.
With agent-client-sdk: evaluator drives the agent via its own CLI; agent uses the SDK normally; we observe externally.
With behavior-drift-monitor: evaluation runs establish the baseline tool-call distribution; the drift monitor compares production reality against this.
Open questions:
- (Q) Should evaluation be a gate (block deploy on failure) or advisory (report and let operator decide)? Lean: configurable per cascade level. Org admins can require eval-pass at org floor for high-risk functions; advisory by default.
- (Q) Probe-author SDK for customers who want to add custom probes? Lean: ship core suite v1; custom-probe SDK in v2.
6. Out of scope
Section titled “6. Out of scope”- Quality / hallucination benchmarking beyond what's needed for policy conformance — separate concern; this evaluator checks whether hallucinations cross policy boundaries, not whether they're factually accurate.
- Model fine-tuning to fix probe failures — orthogonal.
- Continuous in-production red-teaming — would extend this component into a runtime mode; v2.