- scripts/benchmark_reranker.py exercises the configured reranker with synthetic queries or live OpenSearch samples and prints p50/p95/p99 latency, mean latency, and pairs/sec throughput. Supports --warmup, --candidates, --passage-length, --source, and a --json-only mode for CI. - app/indexing/reranker.py clips passages to 2048 characters before scoring so a runaway chunk cannot starve the cross-encoder beyond bge-reranker-v2-m3's training window. - RUNBOOK.md gains a Reranker benchmark section with CPU/GPU SLO targets and a remediation ladder (lower top-K, raise batch size, switch device, disable reranker) when measured p95 exceeds budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.5 KiB
6.5 KiB
LegacyHUB — Operational Runbook
Quick boot (dev)
cp .env.example .env
docker compose up -d --build
docker compose exec api python scripts/init_db.py
docker compose exec api python scripts/init_opensearch.py
docker compose exec api python scripts/init_qdrant.py
docker compose exec api python scripts/smoke_test.py
Verify:
curl -fsS http://localhost:8000/api/v1/health | jq .
Frontend dev:
cd frontend && cp .env.example .env && npm install && npm run dev
# http://localhost:5273
Production deploy
Production overlay enables OpenSearch security plugin, removes default ports, forces externally-supplied credentials, and disables debug routes.
# 1. Ensure secrets exist
cp .env.prod.example .env.prod
$EDITOR .env.prod # rotate every credential, never commit
# 2. Build + recreate
docker compose \
-f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod \
up -d --build --force-recreate api worker
# 3. Migrations
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod exec api python scripts/init_db.py
# 4. Health gate
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod exec api python scripts/smoke_test.py
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'
Hardening notes (mandatory for prod):
- Rotate every credential in
.env.prodfrom.env.prod.exampleplaceholders. - Put OpenSearch behind TLS and admin password. Remove
DISABLE_SECURITY_PLUGIN=true(handled by overlay). - Front the API with a reverse proxy that performs auth + TLS termination.
- Restrict CORS via
CORS_ALLOWED_ORIGINS(comma-separated) — never*in prod. - MinIO root key/secret in prod must come from a secret store, not the repo.
- Mount
data/inputanddata/workfrom durable storage, not the workstation.
Ingestion
# trigger from the API
curl -X POST http://localhost:8000/api/v1/ingest/folder \
-H "Content-Type: application/json" \
-d '{"path":"/data/input","recursive":true,"force":false}'
# or inline (no Celery)
docker compose exec api python scripts/ingest_folder.py \
--path /data/input --recursive --mode inline
# re-index a single doc
docker compose exec api python scripts/reindex_document.py \
--document-id <uuid>
Failure handling
Each stage emits a row to processing_events with level and data. Inspect:
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
"SELECT created_at, stage, level, message FROM processing_events
ORDER BY created_at DESC LIMIT 50;"
| Failure | Where to look | Fix |
|---|---|---|
OCR_FAILED |
processing_events → OCR_STARTED then error |
Confirm tesseract-ocr-rus package; rerun scripts/reindex_document.py |
EXTRACTION_FAILED |
processing_events → Docling stage |
Check timeout; verify Docling version pin |
| Indexing stuck | OpenSearch + Qdrant health | scripts/init_opensearch.py, scripts/init_qdrant.py |
| Reranker disabled | API logs → reranker.disabled |
Ensure RERANKER_ENABLED=true; HF cache mounted |
Verification gates (per change)
python -m pytest tests/ -q— full unit suite (19+ tests).python -m compileall -q app scripts tests.docker compose config --quiet.- Frontend:
npx tsc --noEmit && npm run build. /api/v1/healthreturns{"status":"ok"}.- One smoke ingest of a known PDF; verify
/searchreturns a result.
Rollback
- Capture deployed commit SHA before deploy (
git rev-parse HEAD). - To roll back the API/worker image only:
docker compose -f docker-compose.yml -f docker-compose.prod.yml \ --env-file .env.prod up -d --build --force-recreate api worker \ --no-deps # keep PG/MinIO/OS/Qdrant intact - Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and should not be rolled back casually. Restore from backup via the standard TeamHUB Suite backup runbook.
Reranker benchmark
The reranker is the latency-defining stage of the hybrid search path. Run the benchmark on every hardware change (CPU vs GPU, instance type, batch size) before promoting the configuration.
# synthetic warmup + 32 queries x 40 candidates, ~700-char passages
docker compose exec api python scripts/benchmark_reranker.py \
--queries 32 --candidates 40 --warmup 4
# real corpus sample (after some documents are indexed)
docker compose exec api python scripts/benchmark_reranker.py \
--source opensearch --query "ГОСТ 21.501-93" --candidates 40
Target SLOs (subject to revision once staging numbers land):
| Metric | CPU target | GPU target |
|---|---|---|
| p95 latency / query | < 700 ms | < 120 ms |
| Throughput | > 60 pair/s | > 600 pair/s |
If the measured p95 exceeds the budget, options in order of preference:
- Lower
RERANK_CANDIDATES(default 40 — reducing to 20 roughly halves work). - Increase
RERANKER_BATCH_SIZE(memory permitting). - Switch
RERANKER_DEVICE=cudaand use a GPU-capable image. - Disable reranker (
RERANKER_ENABLED=false) and accept raw RRF order — the API still returns useful results; thererankedfield reports the truth.
Passages are clipped to 2048 chars before being fed to the cross-encoder so a runaway chunk cannot starve the budget.
Scaling notes (~70k PDFs)
- Workers horizontally scale:
docker compose up -d --scale worker=8. - Set
EMBEDDING_DEVICE=cudaon a GPU-capable worker image for ~10× embedding throughput. - OpenSearch single shard suffices to ~10M chunks; increase shards and add replicas in prod.
- Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.
Common one-liners
# count indexed chunks in OpenSearch
curl 'http://localhost:9200/legacy_chunks/_count' | jq .
# inspect Qdrant collection
curl 'http://localhost:6333/collections/legacy_chunks' | jq .
# list MinIO buckets
docker compose exec minio mc alias set local http://localhost:9000 \
"$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
docker compose exec minio mc ls local
# how many docs reached INDEXING_COMPLETED
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
"SELECT status, COUNT(*) FROM documents GROUP BY status;"