scripts/generate_synthetic_pdfs.py builds real PDF/1.4 documents with a hand-written xref so we can generate tens of thousands of ~2 KB PDFs locally. Helvetica only covers latin-1, which is fine for a load generator (throughput, not retrieval relevance); the docstring calls this out so no one mistakes the output for a quality corpus. scripts/load_ingest.py drives POST /ingest/folder, then polls a hypothetical /documents/stats endpoint every poll-interval seconds to track terminal-state progression. Writes a JSON history report so results can be diffed between runs. scripts/locustfile_search.py defines a SearchUser profile mixing hybrid / lexical / semantic queries against POST /search plus a health-check sampler. Asserts non-empty results so a "200 with zero hits" regression surfaces as a failure rather than a green percentile graph. RUNBOOK gains a Load testing section with CPU/GPU SLO tables for both axes (sustained docs/min, search latency p50/p95/p99). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
230 lines
7.9 KiB
Markdown
230 lines
7.9 KiB
Markdown
# LegacyHUB — Operational Runbook
|
||
|
||
## Quick boot (dev)
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
docker compose up -d --build
|
||
docker compose exec api python scripts/init_db.py
|
||
docker compose exec api python scripts/init_opensearch.py
|
||
docker compose exec api python scripts/init_qdrant.py
|
||
docker compose exec api python scripts/smoke_test.py
|
||
```
|
||
|
||
Verify:
|
||
|
||
```bash
|
||
curl -fsS http://localhost:8000/api/v1/health | jq .
|
||
```
|
||
|
||
Frontend dev:
|
||
|
||
```bash
|
||
cd frontend && cp .env.example .env && npm install && npm run dev
|
||
# http://localhost:5273
|
||
```
|
||
|
||
## Production deploy
|
||
|
||
Production overlay enables OpenSearch security plugin, removes default ports,
|
||
forces externally-supplied credentials, and disables debug routes.
|
||
|
||
```bash
|
||
# 1. Ensure secrets exist
|
||
cp .env.prod.example .env.prod
|
||
$EDITOR .env.prod # rotate every credential, never commit
|
||
|
||
# 2. Build + recreate
|
||
docker compose \
|
||
-f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod \
|
||
up -d --build --force-recreate api worker
|
||
|
||
# 3. Migrations
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod exec api python scripts/init_db.py
|
||
|
||
# 4. Health gate
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod exec api python scripts/smoke_test.py
|
||
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'
|
||
```
|
||
|
||
Hardening notes (mandatory for prod):
|
||
|
||
- Rotate every credential in `.env.prod` from `.env.prod.example` placeholders.
|
||
- Put OpenSearch behind TLS and admin password. Remove
|
||
`DISABLE_SECURITY_PLUGIN=true` (handled by overlay).
|
||
- Front the API with a reverse proxy that performs auth + TLS termination.
|
||
- Restrict CORS via `CORS_ALLOWED_ORIGINS` (comma-separated) — never `*` in
|
||
prod.
|
||
- MinIO root key/secret in prod must come from a secret store, not the repo.
|
||
- Mount `data/input` and `data/work` from durable storage, not the workstation.
|
||
|
||
## Ingestion
|
||
|
||
```bash
|
||
# trigger from the API
|
||
curl -X POST http://localhost:8000/api/v1/ingest/folder \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"path":"/data/input","recursive":true,"force":false}'
|
||
|
||
# or inline (no Celery)
|
||
docker compose exec api python scripts/ingest_folder.py \
|
||
--path /data/input --recursive --mode inline
|
||
|
||
# re-index a single doc
|
||
docker compose exec api python scripts/reindex_document.py \
|
||
--document-id <uuid>
|
||
```
|
||
|
||
## Failure handling
|
||
|
||
Each stage emits a row to `processing_events` with `level` and `data`. Inspect:
|
||
|
||
```bash
|
||
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
|
||
"SELECT created_at, stage, level, message FROM processing_events
|
||
ORDER BY created_at DESC LIMIT 50;"
|
||
```
|
||
|
||
| Failure | Where to look | Fix |
|
||
|----------------------|-----------------------------------------------------|----------------------------------|
|
||
| `OCR_FAILED` | `processing_events` → `OCR_STARTED` then error | Confirm `tesseract-ocr-rus` package; rerun `scripts/reindex_document.py` |
|
||
| `EXTRACTION_FAILED` | `processing_events` → Docling stage | Check timeout; verify Docling version pin |
|
||
| Indexing stuck | OpenSearch + Qdrant health | `scripts/init_opensearch.py`, `scripts/init_qdrant.py` |
|
||
| Reranker disabled | API logs → `reranker.disabled` | Ensure `RERANKER_ENABLED=true`; HF cache mounted |
|
||
|
||
## Verification gates (per change)
|
||
|
||
1. `python -m pytest tests/ -q` — full unit suite (19+ tests).
|
||
2. `python -m compileall -q app scripts tests`.
|
||
3. `docker compose config --quiet`.
|
||
4. Frontend: `npx tsc --noEmit && npm run build`.
|
||
5. `/api/v1/health` returns `{"status":"ok"}`.
|
||
6. One smoke ingest of a known PDF; verify `/search` returns a result.
|
||
|
||
## Rollback
|
||
|
||
1. Capture deployed commit SHA before deploy (`git rev-parse HEAD`).
|
||
2. To roll back the API/worker image only:
|
||
```bash
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
|
||
--env-file .env.prod up -d --build --force-recreate api worker \
|
||
--no-deps # keep PG/MinIO/OS/Qdrant intact
|
||
```
|
||
3. Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and
|
||
should not be rolled back casually. Restore from backup via the standard
|
||
TeamHUB Suite backup runbook.
|
||
|
||
## Reranker benchmark
|
||
|
||
The reranker is the latency-defining stage of the hybrid search path. Run the
|
||
benchmark on every hardware change (CPU vs GPU, instance type, batch size)
|
||
before promoting the configuration.
|
||
|
||
```bash
|
||
# synthetic warmup + 32 queries x 40 candidates, ~700-char passages
|
||
docker compose exec api python scripts/benchmark_reranker.py \
|
||
--queries 32 --candidates 40 --warmup 4
|
||
|
||
# real corpus sample (after some documents are indexed)
|
||
docker compose exec api python scripts/benchmark_reranker.py \
|
||
--source opensearch --query "ГОСТ 21.501-93" --candidates 40
|
||
```
|
||
|
||
Target SLOs (subject to revision once staging numbers land):
|
||
|
||
| Metric | CPU target | GPU target |
|
||
|---------------------|-----------:|-----------:|
|
||
| p95 latency / query | < 700 ms | < 120 ms |
|
||
| Throughput | > 60 pair/s | > 600 pair/s |
|
||
|
||
If the measured p95 exceeds the budget, options in order of preference:
|
||
|
||
1. Lower `RERANK_CANDIDATES` (default 40 — reducing to 20 roughly halves work).
|
||
2. Increase `RERANKER_BATCH_SIZE` (memory permitting).
|
||
3. Switch `RERANKER_DEVICE=cuda` and use a GPU-capable image.
|
||
4. Disable reranker (`RERANKER_ENABLED=false`) and accept raw RRF order — the
|
||
API still returns useful results; the `reranked` field reports the truth.
|
||
|
||
Passages are clipped to 2048 chars before being fed to the cross-encoder so a
|
||
runaway chunk cannot starve the budget.
|
||
|
||
## Load testing
|
||
|
||
Two complementary harnesses live under `scripts/`:
|
||
|
||
### Ingest load
|
||
|
||
```bash
|
||
# Generate synthetic PDFs (~3 KB each, real PDF/1.4 with embedded text)
|
||
docker compose exec api python scripts/generate_synthetic_pdfs.py \
|
||
--count 10000 --out /data/input/load
|
||
|
||
# Trigger ingest, sample status every 10 s, dump JSON history
|
||
docker compose exec api python scripts/load_ingest.py \
|
||
--path /data/input/load \
|
||
--api-url http://localhost:8000/api/v1 \
|
||
--watch-seconds 1800 \
|
||
--report-file /data/work/load_report.json
|
||
```
|
||
|
||
Target SLOs at the 70k-document scale (subject to refinement once measured):
|
||
|
||
| Metric | CPU target | GPU target |
|
||
|---------------------------------|-----------:|-----------:|
|
||
| Sustained throughput (docs/min) | > 30 | > 200 |
|
||
| Failure rate | < 1 % | < 0.5 % |
|
||
| p95 per-document wall time | < 90 s | < 25 s |
|
||
|
||
### Search load
|
||
|
||
```bash
|
||
pip install locust # one-time
|
||
|
||
locust -f scripts/locustfile_search.py \
|
||
--host http://localhost:8000 \
|
||
--headless --users 100 --spawn-rate 10 --run-time 10m \
|
||
--html load_search.html
|
||
```
|
||
|
||
Target SLOs for hybrid mode with the reranker enabled:
|
||
|
||
| Percentile | CPU | GPU |
|
||
|------------|-------:|------:|
|
||
| p50 | 600 ms | 120 ms |
|
||
| p95 | 1500 ms | 300 ms |
|
||
| p99 | 3500 ms | 700 ms |
|
||
|
||
If staging numbers miss the budget, walk the reranker remediation ladder above
|
||
before chasing index sharding.
|
||
|
||
## Scaling notes (~70k PDFs)
|
||
|
||
- Workers horizontally scale: `docker compose up -d --scale worker=8`.
|
||
- Set `EMBEDDING_DEVICE=cuda` on a GPU-capable worker image for ~10× embedding
|
||
throughput.
|
||
- OpenSearch single shard suffices to ~10M chunks; increase shards and add
|
||
replicas in prod.
|
||
- Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.
|
||
|
||
## Common one-liners
|
||
|
||
```bash
|
||
# count indexed chunks in OpenSearch
|
||
curl 'http://localhost:9200/legacy_chunks/_count' | jq .
|
||
|
||
# inspect Qdrant collection
|
||
curl 'http://localhost:6333/collections/legacy_chunks' | jq .
|
||
|
||
# list MinIO buckets
|
||
docker compose exec minio mc alias set local http://localhost:9000 \
|
||
"$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
|
||
docker compose exec minio mc ls local
|
||
|
||
# how many docs reached INDEXING_COMPLETED
|
||
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
|
||
"SELECT status, COUNT(*) FROM documents GROUP BY status;"
|
||
```
|