Performance & Stress Report¶

Generated 2026-05-31 · Go 1.26.3 · darwin/arm64 · Apple M1 Max · GOMAXPROCS=10. Numbers are go test -bench output, reproducible with the commands shown.

Summary¶

Workload	Result
Typical scan (`BenchmarkPipelineScan`)	15.9 µs/op · 1262 B · 14 allocs
Throughput (single core, typical input)	≈ 62,700 scans/sec
Large input (`BenchmarkPipelineScanLarge`)	2.66 ms/op · 40.8 MB/s
Large input, concurrent eval	2.60 ms/op · 41.7 MB/s
Scan-cache hit (`BenchmarkScanCache_Hit`)	255 ns/op · 1 alloc
Content normalization (ASCII)	158 ns/op · 0 allocs
Content normalization (homoglyph fold)	765 ns/op
Content normalization (base64 decode)	175 ns/op
Aho-Corasick automaton build	24.4 µs (one-time, at load)
Fuzzing	~537k execs, 0 crashers

The product's stated <1 ms scan budget holds with wide margin for typical inputs (15.9 µs ≈ 1/60th of the budget). Inputs above the large-content threshold (50 KB) take longer but sustain ~41 MB/s and switch to concurrent evaluation above 10 KB.

How to reproduce¶

cd agent
# Latency / throughput
go test ./internal/dlp/ -run='^$' -benchmem -benchtime=2s \
  -bench='BenchmarkPipelineScan|BenchmarkAhoCorasick|BenchmarkScanCache|BenchmarkNormalize'

# Robustness (no crashers expected)
go test ./internal/dlp/ -run='^$' -fuzz='FuzzPipelineScan' -fuzztime=30s

Raw benchmark output¶

BenchmarkNormalizeContent/ascii_passthrough-10   14251724    157.9 ns/op      0 B/op    0 allocs/op
BenchmarkNormalizeContent/homoglyph_fold-10       3112471    764.5 ns/op    112 B/op    3 allocs/op
BenchmarkNormalizeContent/base64_decode-10       13690766    175.0 ns/op    112 B/op    3 allocs/op
BenchmarkPipelineScan-10                           150132  15938   ns/op   1262 B/op   14 allocs/op
BenchmarkPipelineScanLarge-10                         937 2657469  ns/op  40.78 MB/s 118727 B/op  17 allocs/op
BenchmarkPipelineScanLargeConcurrentEval-10           900 2604168  ns/op  41.65 MB/s 119209 B/op  17 allocs/op
BenchmarkAhoCorasickBuild-10                        99352  24356   ns/op  75304 B/op   54 allocs/op
BenchmarkScanCache_Hit-10                         9313563    255.5 ns/op    208 B/op    1 allocs/op

Stress & robustness¶

Fuzzing. FuzzPipelineScan drives arbitrary/malformed byte input through the full pipeline. Across runs totalling ~537,000 executions and 181 interesting inputs, zero crashers were produced (no panics, no testdata/fuzz/ crash artifacts). The pipeline degrades gracefully on garbage input rather than faulting.
Concurrency. All packages pass under -race (see the QA report). Large inputs are evaluated concurrently above the 10 KB threshold with no race conditions detected.
Allocations. The hot path holds at 14 allocations / 1.3 KB per typical scan, and a cache hit is a single allocation — bounded, predictable memory behaviour suited to a constantly-running desktop agent.

Notes & honesty¶

Benchmarks are single-machine (Apple M1 Max). Absolute numbers vary by CPU; the ratios (scan ≪ 1 ms budget, cache hit ≈ 60× faster than a cold scan) are the durable claims.
A live HTTP load test against POST /api/dlp/scan is gated by the API's default rate limiter (≈100 req/s, returns 429 above that); the figures above measure the engine, which is the component under test. End-to-end HTTP throughput is bounded by the configured rate limit, not the engine.