- scripts/benchmark_reranker.py exercises the configured reranker with synthetic queries or live OpenSearch samples and prints p50/p95/p99 latency, mean latency, and pairs/sec throughput. Supports --warmup, --candidates, --passage-length, --source, and a --json-only mode for CI. - app/indexing/reranker.py clips passages to 2048 characters before scoring so a runaway chunk cannot starve the cross-encoder beyond bge-reranker-v2-m3's training window. - RUNBOOK.md gains a Reranker benchmark section with CPU/GPU SLO targets and a remediation ladder (lower top-K, raise batch size, switch device, disable reranker) when measured p95 exceeds budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
181 lines
6.5 KiB
Markdown
181 lines
6.5 KiB
Markdown
# LegacyHUB — Operational Runbook
|
||
|
||
## Quick boot (dev)
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
docker compose up -d --build
|
||
docker compose exec api python scripts/init_db.py
|
||
docker compose exec api python scripts/init_opensearch.py
|
||
docker compose exec api python scripts/init_qdrant.py
|
||
docker compose exec api python scripts/smoke_test.py
|
||
```
|
||
|
||
Verify:
|
||
|
||
```bash
|
||
curl -fsS http://localhost:8000/api/v1/health | jq .
|
||
```
|
||
|
||
Frontend dev:
|
||
|
||
```bash
|
||
cd frontend && cp .env.example .env && npm install && npm run dev
|
||
# http://localhost:5273
|
||
```
|
||
|
||
## Production deploy
|
||
|
||
Production overlay enables OpenSearch security plugin, removes default ports,
|
||
forces externally-supplied credentials, and disables debug routes.
|
||
|
||
```bash
|
||
# 1. Ensure secrets exist
|
||
cp .env.prod.example .env.prod
|
||
$EDITOR .env.prod # rotate every credential, never commit
|
||
|
||
# 2. Build + recreate
|
||
docker compose \
|
||
-f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod \
|
||
up -d --build --force-recreate api worker
|
||
|
||
# 3. Migrations
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod exec api python scripts/init_db.py
|
||
|
||
# 4. Health gate
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod exec api python scripts/smoke_test.py
|
||
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'
|
||
```
|
||
|
||
Hardening notes (mandatory for prod):
|
||
|
||
- Rotate every credential in `.env.prod` from `.env.prod.example` placeholders.
|
||
- Put OpenSearch behind TLS and admin password. Remove
|
||
`DISABLE_SECURITY_PLUGIN=true` (handled by overlay).
|
||
- Front the API with a reverse proxy that performs auth + TLS termination.
|
||
- Restrict CORS via `CORS_ALLOWED_ORIGINS` (comma-separated) — never `*` in
|
||
prod.
|
||
- MinIO root key/secret in prod must come from a secret store, not the repo.
|
||
- Mount `data/input` and `data/work` from durable storage, not the workstation.
|
||
|
||
## Ingestion
|
||
|
||
```bash
|
||
# trigger from the API
|
||
curl -X POST http://localhost:8000/api/v1/ingest/folder \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"path":"/data/input","recursive":true,"force":false}'
|
||
|
||
# or inline (no Celery)
|
||
docker compose exec api python scripts/ingest_folder.py \
|
||
--path /data/input --recursive --mode inline
|
||
|
||
# re-index a single doc
|
||
docker compose exec api python scripts/reindex_document.py \
|
||
--document-id <uuid>
|
||
```
|
||
|
||
## Failure handling
|
||
|
||
Each stage emits a row to `processing_events` with `level` and `data`. Inspect:
|
||
|
||
```bash
|
||
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
|
||
"SELECT created_at, stage, level, message FROM processing_events
|
||
ORDER BY created_at DESC LIMIT 50;"
|
||
```
|
||
|
||
| Failure | Where to look | Fix |
|
||
|----------------------|-----------------------------------------------------|----------------------------------|
|
||
| `OCR_FAILED` | `processing_events` → `OCR_STARTED` then error | Confirm `tesseract-ocr-rus` package; rerun `scripts/reindex_document.py` |
|
||
| `EXTRACTION_FAILED` | `processing_events` → Docling stage | Check timeout; verify Docling version pin |
|
||
| Indexing stuck | OpenSearch + Qdrant health | `scripts/init_opensearch.py`, `scripts/init_qdrant.py` |
|
||
| Reranker disabled | API logs → `reranker.disabled` | Ensure `RERANKER_ENABLED=true`; HF cache mounted |
|
||
|
||
## Verification gates (per change)
|
||
|
||
1. `python -m pytest tests/ -q` — full unit suite (19+ tests).
|
||
2. `python -m compileall -q app scripts tests`.
|
||
3. `docker compose config --quiet`.
|
||
4. Frontend: `npx tsc --noEmit && npm run build`.
|
||
5. `/api/v1/health` returns `{"status":"ok"}`.
|
||
6. One smoke ingest of a known PDF; verify `/search` returns a result.
|
||
|
||
## Rollback
|
||
|
||
1. Capture deployed commit SHA before deploy (`git rev-parse HEAD`).
|
||
2. To roll back the API/worker image only:
|
||
```bash
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod up -d --build --force-recreate api worker \
|
||
--no-deps # keep PG/MinIO/OS/Qdrant intact
|
||
```
|
||
3. Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and
|
||
should not be rolled back casually. Restore from backup via the standard
|
||
TeamHUB Suite backup runbook.
|
||
|
||
## Reranker benchmark
|
||
|
||
The reranker is the latency-defining stage of the hybrid search path. Run the
|
||
benchmark on every hardware change (CPU vs GPU, instance type, batch size)
|
||
before promoting the configuration.
|
||
|
||
```bash
|
||
# synthetic warmup + 32 queries x 40 candidates, ~700-char passages
|
||
docker compose exec api python scripts/benchmark_reranker.py \
|
||
--queries 32 --candidates 40 --warmup 4
|
||
|
||
# real corpus sample (after some documents are indexed)
|
||
docker compose exec api python scripts/benchmark_reranker.py \
|
||
--source opensearch --query "ГОСТ 21.501-93" --candidates 40
|
||
```
|
||
|
||
Target SLOs (subject to revision once staging numbers land):
|
||
|
||
| Metric | CPU target | GPU target |
|
||
|---------------------|-----------:|-----------:|
|
||
| p95 latency / query | < 700 ms | < 120 ms |
|
||
| Throughput | > 60 pair/s | > 600 pair/s |
|
||
|
||
If the measured p95 exceeds the budget, options in order of preference:
|
||
|
||
1. Lower `RERANK_CANDIDATES` (default 40 — reducing to 20 roughly halves work).
|
||
2. Increase `RERANKER_BATCH_SIZE` (memory permitting).
|
||
3. Switch `RERANKER_DEVICE=cuda` and use a GPU-capable image.
|
||
4. Disable reranker (`RERANKER_ENABLED=false`) and accept raw RRF order — the
|
||
API still returns useful results; the `reranked` field reports the truth.
|
||
|
||
Passages are clipped to 2048 chars before being fed to the cross-encoder so a
|
||
runaway chunk cannot starve the budget.
|
||
|
||
## Scaling notes (~70k PDFs)
|
||
|
||
- Workers horizontally scale: `docker compose up -d --scale worker=8`.
|
||
- Set `EMBEDDING_DEVICE=cuda` on a GPU-capable worker image for ~10× embedding
|
||
throughput.
|
||
- OpenSearch single shard suffices to ~10M chunks; increase shards and add
|
||
replicas in prod.
|
||
- Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.
|
||
|
||
## Common one-liners
|
||
|
||
```bash
|
||
# count indexed chunks in OpenSearch
|
||
curl 'http://localhost:9200/legacy_chunks/_count' | jq .
|
||
|
||
# inspect Qdrant collection
|
||
curl 'http://localhost:6333/collections/legacy_chunks' | jq .
|
||
|
||
# list MinIO buckets
|
||
docker compose exec minio mc alias set local http://localhost:9000 \
|
||
"$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
|
||
docker compose exec minio mc ls local
|
||
|
||
# how many docs reached INDEXING_COMPLETED
|
||
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
|
||
"SELECT status, COUNT(*) FROM documents GROUP BY status;"
|
||
```
|