scripts/generate_synthetic_pdfs.py builds real PDF/1.4 documents with
a hand-written xref so we can generate tens of thousands of ~2 KB
PDFs locally. Helvetica only covers latin-1, which is fine for a
load generator (throughput, not retrieval relevance); the docstring
calls this out so no one mistakes the output for a quality corpus.
scripts/load_ingest.py drives POST /ingest/folder, then polls a
hypothetical /documents/stats endpoint every poll-interval seconds
to track terminal-state progression. Writes a JSON history report so
results can be diffed between runs.
scripts/locustfile_search.py defines a SearchUser profile mixing
hybrid / lexical / semantic queries against POST /search plus a
health-check sampler. Asserts non-empty results so a "200 with
zero hits" regression surfaces as a failure rather than a green
percentile graph.
RUNBOOK gains a Load testing section with CPU/GPU SLO tables for
both axes (sustained docs/min, search latency p50/p95/p99).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scripts/benchmark_reranker.py exercises the configured reranker
with synthetic queries or live OpenSearch samples and prints
p50/p95/p99 latency, mean latency, and pairs/sec throughput.
Supports --warmup, --candidates, --passage-length, --source, and a
--json-only mode for CI.
- app/indexing/reranker.py clips passages to 2048 characters before
scoring so a runaway chunk cannot starve the cross-encoder beyond
bge-reranker-v2-m3's training window.
- RUNBOOK.md gains a Reranker benchmark section with CPU/GPU SLO
targets and a remediation ladder (lower top-K, raise batch size,
switch device, disable reranker) when measured p95 exceeds budget.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>