# LegacyHUB — Operational Runbook ## Quick boot (dev) ```bash cp .env.example .env docker compose up -d --build docker compose exec api python scripts/init_db.py docker compose exec api python scripts/init_opensearch.py docker compose exec api python scripts/init_qdrant.py docker compose exec api python scripts/smoke_test.py ``` Verify: ```bash curl -fsS http://localhost:8000/api/v1/health | jq . ``` Frontend dev: ```bash cd frontend && cp .env.example .env && npm install && npm run dev # http://localhost:5273 ``` ## Production deploy Production overlay enables OpenSearch security plugin, removes default ports, forces externally-supplied credentials, and disables debug routes. ```bash # 1. Ensure secrets exist cp .env.prod.example .env.prod $EDITOR .env.prod # rotate every credential, never commit # 2. Build + recreate docker compose \ -f docker-compose.yml -f docker-compose.prod.yml \ --env-file .env.prod \ up -d --build --force-recreate api worker # 3. Migrations docker compose -f docker-compose.yml -f docker-compose.prod.yml \ --env-file .env.prod exec api python scripts/init_db.py # 4. Health gate docker compose -f docker-compose.yml -f docker-compose.prod.yml \ --env-file .env.prod exec api python scripts/smoke_test.py curl -fsS https:///api/v1/health | jq -e '.status == "ok"' ``` Hardening notes (mandatory for prod): - Rotate every credential in `.env.prod` from `.env.prod.example` placeholders. - Put OpenSearch behind TLS and admin password. Remove `DISABLE_SECURITY_PLUGIN=true` (handled by overlay). - Front the API with a reverse proxy that performs auth + TLS termination. - Restrict CORS via `CORS_ALLOWED_ORIGINS` (comma-separated) — never `*` in prod. - MinIO root key/secret in prod must come from a secret store, not the repo. - Mount `data/input` and `data/work` from durable storage, not the workstation. ## Ingestion ```bash # trigger from the API curl -X POST http://localhost:8000/api/v1/ingest/folder \ -H "Content-Type: application/json" \ -d '{"path":"/data/input","recursive":true,"force":false}' # or inline (no Celery) docker compose exec api python scripts/ingest_folder.py \ --path /data/input --recursive --mode inline # re-index a single doc docker compose exec api python scripts/reindex_document.py \ --document-id ``` ## Failure handling Each stage emits a row to `processing_events` with `level` and `data`. Inspect: ```bash docker compose exec postgres psql -U legacyhub -d legacyhub -c \ "SELECT created_at, stage, level, message FROM processing_events ORDER BY created_at DESC LIMIT 50;" ``` | Failure | Where to look | Fix | |----------------------|-----------------------------------------------------|----------------------------------| | `OCR_FAILED` | `processing_events` → `OCR_STARTED` then error | Confirm `tesseract-ocr-rus` package; rerun `scripts/reindex_document.py` | | `EXTRACTION_FAILED` | `processing_events` → Docling stage | Check timeout; verify Docling version pin | | Indexing stuck | OpenSearch + Qdrant health | `scripts/init_opensearch.py`, `scripts/init_qdrant.py` | | Reranker disabled | API logs → `reranker.disabled` | Ensure `RERANKER_ENABLED=true`; HF cache mounted | ## Verification gates (per change) 1. `python -m pytest tests/ -q` — full unit suite (19+ tests). 2. `python -m compileall -q app scripts tests`. 3. `docker compose config --quiet`. 4. Frontend: `npx tsc --noEmit && npm run build`. 5. `/api/v1/health` returns `{"status":"ok"}`. 6. One smoke ingest of a known PDF; verify `/search` returns a result. ## Rollback 1. Capture deployed commit SHA before deploy (`git rev-parse HEAD`). 2. To roll back the API/worker image only: ```bash docker compose -f docker-compose.yml -f docker-compose.prod.yml \ --env-file .env.prod up -d --build --force-recreate api worker \ --no-deps # keep PG/MinIO/OS/Qdrant intact ``` 3. Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and should not be rolled back casually. Restore from backup via the standard TeamHUB Suite backup runbook. ## Reranker benchmark The reranker is the latency-defining stage of the hybrid search path. Run the benchmark on every hardware change (CPU vs GPU, instance type, batch size) before promoting the configuration. ```bash # synthetic warmup + 32 queries x 40 candidates, ~700-char passages docker compose exec api python scripts/benchmark_reranker.py \ --queries 32 --candidates 40 --warmup 4 # real corpus sample (after some documents are indexed) docker compose exec api python scripts/benchmark_reranker.py \ --source opensearch --query "ГОСТ 21.501-93" --candidates 40 ``` Target SLOs (subject to revision once staging numbers land): | Metric | CPU target | GPU target | |---------------------|-----------:|-----------:| | p95 latency / query | < 700 ms | < 120 ms | | Throughput | > 60 pair/s | > 600 pair/s | If the measured p95 exceeds the budget, options in order of preference: 1. Lower `RERANK_CANDIDATES` (default 40 — reducing to 20 roughly halves work). 2. Increase `RERANKER_BATCH_SIZE` (memory permitting). 3. Switch `RERANKER_DEVICE=cuda` and use a GPU-capable image. 4. Disable reranker (`RERANKER_ENABLED=false`) and accept raw RRF order — the API still returns useful results; the `reranked` field reports the truth. Passages are clipped to 2048 chars before being fed to the cross-encoder so a runaway chunk cannot starve the budget. ## Scaling notes (~70k PDFs) - Workers horizontally scale: `docker compose up -d --scale worker=8`. - Set `EMBEDDING_DEVICE=cuda` on a GPU-capable worker image for ~10× embedding throughput. - OpenSearch single shard suffices to ~10M chunks; increase shards and add replicas in prod. - Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that. ## Common one-liners ```bash # count indexed chunks in OpenSearch curl 'http://localhost:9200/legacy_chunks/_count' | jq . # inspect Qdrant collection curl 'http://localhost:6333/collections/legacy_chunks' | jq . # list MinIO buckets docker compose exec minio mc alias set local http://localhost:9000 \ "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY" docker compose exec minio mc ls local # how many docs reached INDEXING_COMPLETED docker compose exec postgres psql -U legacyhub -d legacyhub -c \ "SELECT status, COUNT(*) FROM documents GROUP BY status;" ```