perf: add ingest and search load-test harnesses
scripts/generate_synthetic_pdfs.py builds real PDF/1.4 documents with a hand-written xref so we can generate tens of thousands of ~2 KB PDFs locally. Helvetica only covers latin-1, which is fine for a load generator (throughput, not retrieval relevance); the docstring calls this out so no one mistakes the output for a quality corpus. scripts/load_ingest.py drives POST /ingest/folder, then polls a hypothetical /documents/stats endpoint every poll-interval seconds to track terminal-state progression. Writes a JSON history report so results can be diffed between runs. scripts/locustfile_search.py defines a SearchUser profile mixing hybrid / lexical / semantic queries against POST /search plus a health-check sampler. Asserts non-empty results so a "200 with zero hits" regression surfaces as a failure rather than a green percentile graph. RUNBOOK gains a Load testing section with CPU/GPU SLO tables for both axes (sustained docs/min, search latency p50/p95/p99). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
49
RUNBOOK.md
49
RUNBOOK.md
@@ -151,6 +151,55 @@ If the measured p95 exceeds the budget, options in order of preference:
|
||||
Passages are clipped to 2048 chars before being fed to the cross-encoder so a
|
||||
runaway chunk cannot starve the budget.
|
||||
|
||||
## Load testing
|
||||
|
||||
Two complementary harnesses live under `scripts/`:
|
||||
|
||||
### Ingest load
|
||||
|
||||
```bash
|
||||
# Generate synthetic PDFs (~3 KB each, real PDF/1.4 with embedded text)
|
||||
docker compose exec api python scripts/generate_synthetic_pdfs.py \
|
||||
--count 10000 --out /data/input/load
|
||||
|
||||
# Trigger ingest, sample status every 10 s, dump JSON history
|
||||
docker compose exec api python scripts/load_ingest.py \
|
||||
--path /data/input/load \
|
||||
--api-url http://localhost:8000/api/v1 \
|
||||
--watch-seconds 1800 \
|
||||
--report-file /data/work/load_report.json
|
||||
```
|
||||
|
||||
Target SLOs at the 70k-document scale (subject to refinement once measured):
|
||||
|
||||
| Metric | CPU target | GPU target |
|
||||
|---------------------------------|-----------:|-----------:|
|
||||
| Sustained throughput (docs/min) | > 30 | > 200 |
|
||||
| Failure rate | < 1 % | < 0.5 % |
|
||||
| p95 per-document wall time | < 90 s | < 25 s |
|
||||
|
||||
### Search load
|
||||
|
||||
```bash
|
||||
pip install locust # one-time
|
||||
|
||||
locust -f scripts/locustfile_search.py \
|
||||
--host http://localhost:8000 \
|
||||
--headless --users 100 --spawn-rate 10 --run-time 10m \
|
||||
--html load_search.html
|
||||
```
|
||||
|
||||
Target SLOs for hybrid mode with the reranker enabled:
|
||||
|
||||
| Percentile | CPU | GPU |
|
||||
|------------|-------:|------:|
|
||||
| p50 | 600 ms | 120 ms |
|
||||
| p95 | 1500 ms | 300 ms |
|
||||
| p99 | 3500 ms | 700 ms |
|
||||
|
||||
If staging numbers miss the budget, walk the reranker remediation ladder above
|
||||
before chasing index sharding.
|
||||
|
||||
## Scaling notes (~70k PDFs)
|
||||
|
||||
- Workers horizontally scale: `docker compose up -d --scale worker=8`.
|
||||
|
||||
Reference in New Issue
Block a user