Whitepaper & Engine Foundations¶
Prompt Gate DLP engine — design and mathematics. Revision 2026-05-31.
This document describes the detection engine precisely enough to reason about and reproduce its behaviour. Every measured figure is cross-linked to the Security and Performance reports, which regenerate the numbers from tests.
1. Problem¶
Pattern-based DLP fails in two directions at once. Naïve regex over-fires on documentation keys and placeholders, training users to dismiss warnings; tuned- down regex misses obfuscated secrets. Prompt Gate is engineered precision- first: keep the false-positive rate at zero on a realistic negative corpus, then raise recall by adding patterns — never the reverse.
The engine is a pure function
with no I/O and no per-event persistence (§6). This purity is what makes it testable, fuzzable, and privacy-safe.
2. Pipeline¶
Content flows through five stages. Each stage can only reduce the chance of a false block relative to raw regex:
content
│ C0 Normalization canonicalize evasion forms
▼
│ C1 Public-example filter suppress known-safe sample values
▼
│ C2 Aho-Corasick prematch single-pass literal hotword scan
▼
│ C3 Pattern match + entropy regex on survivors, Shannon entropy
▼
│ C4 Scoring + threshold weighted evidence vs severity bar
▼
verdict
3. C0 — Normalization¶
Let N(·) be the normalizer. It folds homoglyphs to their ASCII skeleton,
strips zero-width / format characters (U+200B, U+2060, …), lowercases, and
expands base64 segments. The key property is idempotence:
so matching on N(s) is stable, and an attacker cannot gain anything by
pre-applying any transform the normalizer already inverts. This is why the
obfuscated_must_trigger corpus scores 6/6 (homoglyph А→A, zero-width
splits, base64-wrapped keys all collapse to the same skeleton before matching).
Cost: 158 ns (ASCII) to 765 ns (homoglyph fold) per call — see
Performance.
4. C1 — Public-example suppression¶
A curated set H holds SHA-256 digests of well-known public sample values
(AWS docs key AKIAIOSFODNN7EXAMPLE, Stripe test keys, PCI test PANs, RFC 4122
documentation UUIDs, the jwt.io canonical token):
H = { SHA256(norm(v)) : v ∈ public examples }
isPublicExample(x) = SHA256(norm(x)) ∈ H // O(1) average
norm(·) strips spaces, dashes, and underscores and lowercases, so
4111 1111 1111 1111, 4111-1111-1111-1111, and 4111111111111111 collide to
one digest. This is an exact-membership set — only digests are stored, never
the raw values — so it has zero false positives by construction (no
probabilistic Bloom error term). It is the layer that keeps documentation keys
out of the block path, contributing directly to the measured 0 % FP rate.
5. C2/C3 — Matching and entropy¶
Aho-Corasick. Literal hotwords across all patterns are compiled once into a single automaton (24.4 µs build, amortized at load). Scanning is one pass:
i.e. independent of the number of patterns at scan time — adding patterns grows the automaton, not the per-scan cost. This is why 165 patterns scan in 15.9 µs.
Shannon entropy. For survivors, the engine measures byte-level entropy to distinguish high-randomness secrets (API keys, tokens) from prose:
with 0 ≤ H(s) ≤ 8 bits/byte. Low-entropy English prose sits well below
random-looking key material; H feeds the scoring step as a boost or penalty.
6. C4 — Scoring and threshold¶
Each candidate accumulates an integer score from additive evidence weights
(defaults, tunable in dlp_config):
| Signal | Weight |
|---|---|
Hotword present (HotwordBoost) |
+2 |
High entropy (EntropyBoost) |
+1 |
Low entropy (EntropyPenalty) |
−2 |
Exclusion term present (ExclusionPenalty) |
−3 |
Corroborating second match (MultiMatchBoost) |
+1 |
A pattern carries a severity; the verdict is
More severe categories have a lower bar (block on weaker evidence); low- severity categories require corroboration (score ≥ 4) before blocking. Unknown severities fall back to the highest threshold, so unrecognized input errs toward allow — surprise blocks are rare by design.
7. Accuracy¶
With TP, FP, TN, FN from the labelled corpus:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2 · Precision · Recall / (Precision + Recall)
Measured on agent/internal/dlp/testdata/fp_corpus/ (16 TP, 0 FP, 77 TN, 6 FN):
The precision-first stance is explicit: the 6 false-negatives are footer-only or truncated secrets; none of the 77 negatives produced a false block. See the Security report to regenerate.
8. Privacy model¶
The engine is a pure function (§1); the surrounding agent upholds a
zero-persistence invariant. Let D be the on-disk state. The invariant is
D ⊆ { policy config, aggregate integer counters, rule files,
salted allowlist hashes, opt-in consent-gated block events }
and, crucially, for any per-event content c, domain d, IP a, or user
identifier u:
This is asserted by TestPrivacy_*, which sweeps every text column of the
SQLite store after running scans and fails if any forbidden value appears (see
Security report → Privacy invariant).
Scanned content lives only in memory and is garbage-collected after the verdict.
9. Reproducibility¶
Every claim here is backed by a test or benchmark:
cd agent
go test -race ./... # correctness (§2–§6)
go test ./internal/dlp/ -run 'TestFPCorpus' -v # accuracy (§7)
go test ./internal/store/ -run TestPrivacy -v # privacy (§8)
go test ./internal/dlp/ -run='^$' -bench=. -benchmem # performance (§3,§5)
go test ./internal/dlp/ -run='^$' -fuzz=FuzzPipelineScan -fuzztime=30s # robustness