Files
LegacyHUB/RUNBOOK.md
Vadim Malanov 24282d1279 feat(api): optional API-key auth middleware
Adds defence-in-depth shared-secret auth that activates when API_KEY
is set. Behaviour:

- empty API_KEY (dev default): every request allowed, middleware is
  not even installed;
- non-empty API_KEY: every request under APP_API_PREFIX except
  /health must carry X-API-Key: <value> or
  Authorization: Bearer <value>. /, /docs, /redoc, /openapi.json and
  CORS preflight stay open. hmac.compare_digest is used for the
  constant-time comparison.

The middleware resolves settings lazily so test fixtures can reload
app.config and have the new API_KEY take effect on the next install.

Tests (tests/test_api_security.py, 5 cases):
- /health remains open;
- protected route rejects missing key (401);
- protected route accepts X-API-Key header;
- protected route accepts Authorization: Bearer header;
- protected route rejects a wrong key.

Frontend:
- VITE_API_KEY env reads the key and Axios injects it on every
  request, falling back to no header when empty so SSO/reverse-proxy
  deployments stay unchanged.
- vite-env.d.ts adds the new env entry.

Docs/ops:
- .env.example documents the dev-default empty key;
- .env.prod.example marks API_KEY as a required rotation point;
- docker-compose.yml forwards API_KEY (defaults to empty);
- docker-compose.prod.yml fails the stack with ?:required when API_KEY
  is missing;
- RUNBOOK gains an API authentication section with header examples
  and the reverse-proxy + key layering recommendation.

pytest -q: 33 passed (5 new security + 28 prior).
npx tsc --noEmit: clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:17:27 +03:00

260 lines
8.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# LegacyHUB — Operational Runbook
## Quick boot (dev)
```bash
cp .env.example .env
docker compose up -d --build
docker compose exec api python scripts/init_db.py
docker compose exec api python scripts/init_opensearch.py
docker compose exec api python scripts/init_qdrant.py
docker compose exec api python scripts/smoke_test.py
```
Verify:
```bash
curl -fsS http://localhost:8000/api/v1/health | jq .
```
Frontend dev:
```bash
cd frontend && cp .env.example .env && npm install && npm run dev
# http://localhost:5273
```
## Production deploy
Production overlay enables OpenSearch security plugin, removes default ports,
forces externally-supplied credentials, and disables debug routes.
```bash
# 1. Ensure secrets exist
cp .env.prod.example .env.prod
$EDITOR .env.prod # rotate every credential, never commit
# 2. Build + recreate
docker compose \
-f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod \
up -d --build --force-recreate api worker
# 3. Migrations
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod exec api python scripts/init_db.py
# 4. Health gate
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod exec api python scripts/smoke_test.py
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'
```
Hardening notes (mandatory for prod):
- Rotate every credential in `.env.prod` from `.env.prod.example` placeholders.
- Put OpenSearch behind TLS and admin password. Remove
`DISABLE_SECURITY_PLUGIN=true` (handled by overlay).
- Front the API with a reverse proxy that performs auth + TLS termination.
- Restrict CORS via `CORS_ALLOWED_ORIGINS` (comma-separated) — never `*` in
prod.
- MinIO root key/secret in prod must come from a secret store, not the repo.
- Mount `data/input` and `data/work` from durable storage, not the workstation.
## Ingestion
```bash
# trigger from the API
curl -X POST http://localhost:8000/api/v1/ingest/folder \
-H "Content-Type: application/json" \
-d '{"path":"/data/input","recursive":true,"force":false}'
# or inline (no Celery)
docker compose exec api python scripts/ingest_folder.py \
--path /data/input --recursive --mode inline
# re-index a single doc
docker compose exec api python scripts/reindex_document.py \
--document-id <uuid>
```
## Failure handling
Each stage emits a row to `processing_events` with `level` and `data`. Inspect:
```bash
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
"SELECT created_at, stage, level, message FROM processing_events
ORDER BY created_at DESC LIMIT 50;"
```
| Failure | Where to look | Fix |
|----------------------|-----------------------------------------------------|----------------------------------|
| `OCR_FAILED` | `processing_events``OCR_STARTED` then error | Confirm `tesseract-ocr-rus` package; rerun `scripts/reindex_document.py` |
| `EXTRACTION_FAILED` | `processing_events` → Docling stage | Check timeout; verify Docling version pin |
| Indexing stuck | OpenSearch + Qdrant health | `scripts/init_opensearch.py`, `scripts/init_qdrant.py` |
| Reranker disabled | API logs → `reranker.disabled` | Ensure `RERANKER_ENABLED=true`; HF cache mounted |
## API authentication
Two mechanisms layered together:
1. **Reverse proxy / SSO** (preferred). Front the API with nginx, Traefik, or
an OAuth gateway. The reverse proxy terminates TLS and authenticates the
caller; LegacyHUB never sees a raw user identity.
2. **Shared-secret API key** (defence in depth). Set `API_KEY` to a long
random value (`openssl rand -hex 32`). Every request to `APP_API_PREFIX`
except `/health` must then carry either:
```http
X-API-Key: <key>
```
or:
```http
Authorization: Bearer <key>
```
`/health` is intentionally exempt so external probes do not need the
secret.
In production this is required (`docker-compose.prod.yml` fails the
stack if `API_KEY` is empty). In development the key is optional and
the default empty value disables the middleware entirely.
The frontend reads `VITE_API_KEY` and injects the header on every Axios
request. For SSO deployments leave `VITE_API_KEY` empty and let the
reverse proxy inject the header server-side.
## Verification gates (per change)
1. `python -m pytest tests/ -q` — full unit suite (19+ tests).
2. `python -m compileall -q app scripts tests`.
3. `docker compose config --quiet`.
4. Frontend: `npx tsc --noEmit && npm run build`.
5. `/api/v1/health` returns `{"status":"ok"}`.
6. One smoke ingest of a known PDF; verify `/search` returns a result.
## Rollback
1. Capture deployed commit SHA before deploy (`git rev-parse HEAD`).
2. To roll back the API/worker image only:
```bash
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
--env-file .env.prod up -d --build --force-recreate api worker \
--no-deps # keep PG/MinIO/OS/Qdrant intact
```
3. Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and
should not be rolled back casually. Restore from backup via the standard
TeamHUB Suite backup runbook.
## Reranker benchmark
The reranker is the latency-defining stage of the hybrid search path. Run the
benchmark on every hardware change (CPU vs GPU, instance type, batch size)
before promoting the configuration.
```bash
# synthetic warmup + 32 queries x 40 candidates, ~700-char passages
docker compose exec api python scripts/benchmark_reranker.py \
--queries 32 --candidates 40 --warmup 4
# real corpus sample (after some documents are indexed)
docker compose exec api python scripts/benchmark_reranker.py \
--source opensearch --query "ГОСТ 21.501-93" --candidates 40
```
Target SLOs (subject to revision once staging numbers land):
| Metric | CPU target | GPU target |
|---------------------|-----------:|-----------:|
| p95 latency / query | < 700 ms | < 120 ms |
| Throughput | > 60 pair/s | > 600 pair/s |
If the measured p95 exceeds the budget, options in order of preference:
1. Lower `RERANK_CANDIDATES` (default 40 — reducing to 20 roughly halves work).
2. Increase `RERANKER_BATCH_SIZE` (memory permitting).
3. Switch `RERANKER_DEVICE=cuda` and use a GPU-capable image.
4. Disable reranker (`RERANKER_ENABLED=false`) and accept raw RRF order — the
API still returns useful results; the `reranked` field reports the truth.
Passages are clipped to 2048 chars before being fed to the cross-encoder so a
runaway chunk cannot starve the budget.
## Load testing
Two complementary harnesses live under `scripts/`:
### Ingest load
```bash
# Generate synthetic PDFs (~3 KB each, real PDF/1.4 with embedded text)
docker compose exec api python scripts/generate_synthetic_pdfs.py \
--count 10000 --out /data/input/load
# Trigger ingest, sample status every 10 s, dump JSON history
docker compose exec api python scripts/load_ingest.py \
--path /data/input/load \
--api-url http://localhost:8000/api/v1 \
--watch-seconds 1800 \
--report-file /data/work/load_report.json
```
Target SLOs at the 70k-document scale (subject to refinement once measured):
| Metric | CPU target | GPU target |
|---------------------------------|-----------:|-----------:|
| Sustained throughput (docs/min) | > 30 | > 200 |
| Failure rate | < 1 % | < 0.5 % |
| p95 per-document wall time | < 90 s | < 25 s |
### Search load
```bash
pip install locust # one-time
locust -f scripts/locustfile_search.py \
--host http://localhost:8000 \
--headless --users 100 --spawn-rate 10 --run-time 10m \
--html load_search.html
```
Target SLOs for hybrid mode with the reranker enabled:
| Percentile | CPU | GPU |
|------------|-------:|------:|
| p50 | 600 ms | 120 ms |
| p95 | 1500 ms | 300 ms |
| p99 | 3500 ms | 700 ms |
If staging numbers miss the budget, walk the reranker remediation ladder above
before chasing index sharding.
## Scaling notes (~70k PDFs)
- Workers horizontally scale: `docker compose up -d --scale worker=8`.
- Set `EMBEDDING_DEVICE=cuda` on a GPU-capable worker image for ~10× embedding
throughput.
- OpenSearch single shard suffices to ~10M chunks; increase shards and add
replicas in prod.
- Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.
## Common one-liners
```bash
# count indexed chunks in OpenSearch
curl 'http://localhost:9200/legacy_chunks/_count' | jq .
# inspect Qdrant collection
curl 'http://localhost:6333/collections/legacy_chunks' | jq .
# list MinIO buckets
docker compose exec minio mc alias set local http://localhost:9000 \
"$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
docker compose exec minio mc ls local
# how many docs reached INDEXING_COMPLETED
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
"SELECT status, COUNT(*) FROM documents GROUP BY status;"
```