Files

Vadim Malanov a97d0bbcfd perf: add ingest and search load-test harnesses

scripts/generate_synthetic_pdfs.py builds real PDF/1.4 documents with
a hand-written xref so we can generate tens of thousands of ~2 KB
PDFs locally. Helvetica only covers latin-1, which is fine for a
load generator (throughput, not retrieval relevance); the docstring
calls this out so no one mistakes the output for a quality corpus.

scripts/load_ingest.py drives POST /ingest/folder, then polls a
hypothetical /documents/stats endpoint every poll-interval seconds
to track terminal-state progression. Writes a JSON history report so
results can be diffed between runs.

scripts/locustfile_search.py defines a SearchUser profile mixing
hybrid / lexical / semantic queries against POST /search plus a
health-check sampler. Asserts non-empty results so a "200 with
zero hits" regression surfaces as a failure rather than a green
percentile graph.

RUNBOOK gains a Load testing section with CPU/GPU SLO tables for
both axes (sustained docs/min, search latency p50/p95/p99).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-13 17:11:08 +03:00

7.9 KiB

Raw Blame History

LegacyHUB — Operational Runbook

Quick boot (dev)

cp .env.example .env
docker compose up -d --build
docker compose exec api python scripts/init_db.py
docker compose exec api python scripts/init_opensearch.py
docker compose exec api python scripts/init_qdrant.py
docker compose exec api python scripts/smoke_test.py

Verify:

curl -fsS http://localhost:8000/api/v1/health | jq .

Frontend dev:

cd frontend && cp .env.example .env && npm install && npm run dev
# http://localhost:5273

Production deploy

Production overlay enables OpenSearch security plugin, removes default ports, forces externally-supplied credentials, and disables debug routes.

# 1. Ensure secrets exist
cp .env.prod.example .env.prod
$EDITOR .env.prod          # rotate every credential, never commit

# 2. Build + recreate
docker compose \
  -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod \
  up -d --build --force-recreate api worker

# 3. Migrations
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod exec api python scripts/init_db.py

# 4. Health gate
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod exec api python scripts/smoke_test.py
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'

Hardening notes (mandatory for prod):

Rotate every credential in .env.prod from .env.prod.example placeholders.
Put OpenSearch behind TLS and admin password. Remove DISABLE_SECURITY_PLUGIN=true (handled by overlay).
Front the API with a reverse proxy that performs auth + TLS termination.
Restrict CORS via CORS_ALLOWED_ORIGINS (comma-separated) — never * in prod.
MinIO root key/secret in prod must come from a secret store, not the repo.
Mount data/input and data/work from durable storage, not the workstation.

Ingestion

# trigger from the API
curl -X POST http://localhost:8000/api/v1/ingest/folder \
  -H "Content-Type: application/json" \
  -d '{"path":"/data/input","recursive":true,"force":false}'

# or inline (no Celery)
docker compose exec api python scripts/ingest_folder.py \
  --path /data/input --recursive --mode inline

# re-index a single doc
docker compose exec api python scripts/reindex_document.py \
  --document-id <uuid>

Failure handling

Each stage emits a row to processing_events with level and data. Inspect:

docker compose exec postgres psql -U legacyhub -d legacyhub -c \
  "SELECT created_at, stage, level, message FROM processing_events
   ORDER BY created_at DESC LIMIT 50;"

Failure	Where to look	Fix
`OCR_FAILED`	`processing_events` → `OCR_STARTED` then error	Confirm `tesseract-ocr-rus` package; rerun `scripts/reindex_document.py`
`EXTRACTION_FAILED`	`processing_events` → Docling stage	Check timeout; verify Docling version pin
Indexing stuck	OpenSearch + Qdrant health	`scripts/init_opensearch.py`, `scripts/init_qdrant.py`
Reranker disabled	API logs → `reranker.disabled`	Ensure `RERANKER_ENABLED=true`; HF cache mounted

Verification gates (per change)

python -m pytest tests/ -q — full unit suite (19+ tests).
python -m compileall -q app scripts tests.
docker compose config --quiet.
Frontend: npx tsc --noEmit && npm run build.
/api/v1/health returns {"status":"ok"}.
One smoke ingest of a known PDF; verify /search returns a result.

Rollback

Capture deployed commit SHA before deploy (git rev-parse HEAD).

To roll back the API/worker image only:

docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod up -d --build --force-recreate api worker \
  --no-deps  # keep PG/MinIO/OS/Qdrant intact

Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and should not be rolled back casually. Restore from backup via the standard TeamHUB Suite backup runbook.

Reranker benchmark

The reranker is the latency-defining stage of the hybrid search path. Run the benchmark on every hardware change (CPU vs GPU, instance type, batch size) before promoting the configuration.

# synthetic warmup + 32 queries x 40 candidates, ~700-char passages
docker compose exec api python scripts/benchmark_reranker.py \
  --queries 32 --candidates 40 --warmup 4

# real corpus sample (after some documents are indexed)
docker compose exec api python scripts/benchmark_reranker.py \
  --source opensearch --query "ГОСТ 21.501-93" --candidates 40

Target SLOs (subject to revision once staging numbers land):

Metric	CPU target	GPU target
p95 latency / query	< 700 ms	< 120 ms
Throughput	> 60 pair/s	> 600 pair/s

If the measured p95 exceeds the budget, options in order of preference:

Lower RERANK_CANDIDATES (default 40 — reducing to 20 roughly halves work).
Increase RERANKER_BATCH_SIZE (memory permitting).
Switch RERANKER_DEVICE=cuda and use a GPU-capable image.
Disable reranker (RERANKER_ENABLED=false) and accept raw RRF order — the API still returns useful results; the reranked field reports the truth.

Passages are clipped to 2048 chars before being fed to the cross-encoder so a runaway chunk cannot starve the budget.

Load testing

Two complementary harnesses live under scripts/:

Ingest load

# Generate synthetic PDFs (~3 KB each, real PDF/1.4 with embedded text)
docker compose exec api python scripts/generate_synthetic_pdfs.py \
  --count 10000 --out /data/input/load

# Trigger ingest, sample status every 10 s, dump JSON history
docker compose exec api python scripts/load_ingest.py \
  --path /data/input/load \
  --api-url http://localhost:8000/api/v1 \
  --watch-seconds 1800 \
  --report-file /data/work/load_report.json

Target SLOs at the 70k-document scale (subject to refinement once measured):

Metric	CPU target	GPU target
Sustained throughput (docs/min)	> 30	> 200
Failure rate	< 1 %	< 0.5 %
p95 per-document wall time	< 90 s	< 25 s

Search load

pip install locust  # one-time

locust -f scripts/locustfile_search.py \
       --host http://localhost:8000 \
       --headless --users 100 --spawn-rate 10 --run-time 10m \
       --html load_search.html

Target SLOs for hybrid mode with the reranker enabled:

Percentile	CPU	GPU
p50	600 ms	120 ms
p95	1500 ms	300 ms
p99	3500 ms	700 ms

If staging numbers miss the budget, walk the reranker remediation ladder above before chasing index sharding.

Scaling notes (~70k PDFs)

Workers horizontally scale: docker compose up -d --scale worker=8.
Set EMBEDDING_DEVICE=cuda on a GPU-capable worker image for ~10× embedding throughput.
OpenSearch single shard suffices to ~10M chunks; increase shards and add replicas in prod.
Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.

Common one-liners

# count indexed chunks in OpenSearch
curl 'http://localhost:9200/legacy_chunks/_count' | jq .

# inspect Qdrant collection
curl 'http://localhost:6333/collections/legacy_chunks' | jq .

# list MinIO buckets
docker compose exec minio mc alias set local http://localhost:9000 \
  "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
docker compose exec minio mc ls local

# how many docs reached INDEXING_COMPLETED
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
  "SELECT status, COUNT(*) FROM documents GROUP BY status;"

7.9 KiB Raw Blame History Unescape Escape