LegacyHUB/RUNBOOK.md

# LegacyHUB — Operational Runbook

## Quick boot (dev)

```bash
cp .env.example .env
docker compose up -d --build
docker compose exec api python scripts/init_db.py
docker compose exec api python scripts/init_opensearch.py
docker compose exec api python scripts/init_qdrant.py
docker compose exec api python scripts/smoke_test.py
```

Verify:

```bash
curl -fsS http://localhost:8000/api/v1/health | jq .
```

Frontend dev:

```bash
cd frontend && cp .env.example .env && npm install && npm run dev
# http://localhost:5273
```

## Production deploy

Production overlay enables OpenSearch security plugin, removes default ports,
forces externally-supplied credentials, and disables debug routes.

```bash
# 1. Ensure secrets exist
cp .env.prod.example .env.prod
$EDITOR .env.prod          # rotate every credential, never commit

# 2. Build + recreate
docker compose \
  -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod \
  up -d --build --force-recreate api worker

# 3. Migrations
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod exec api python scripts/init_db.py

# 4. Health gate
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod exec api python scripts/smoke_test.py
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'
```

Hardening notes (mandatory for prod):

- Rotate every credential in `.env.prod` from `.env.prod.example` placeholders.
- Put OpenSearch behind TLS and admin password. Remove
  `DISABLE_SECURITY_PLUGIN=true` (handled by overlay).
- Front the API with a reverse proxy that performs auth + TLS termination.
- Restrict CORS via `CORS_ALLOWED_ORIGINS` (comma-separated) — never `*` in
  prod.
- MinIO root key/secret in prod must come from a secret store, not the repo.
- Mount `data/input` and `data/work` from durable storage, not the workstation.

## Ingestion

```bash
# trigger from the API
curl -X POST http://localhost:8000/api/v1/ingest/folder \
  -H "Content-Type: application/json" \
  -d '{"path":"/data/input","recursive":true,"force":false}'

# or inline (no Celery)
docker compose exec api python scripts/ingest_folder.py \
  --path /data/input --recursive --mode inline

# re-index a single doc
docker compose exec api python scripts/reindex_document.py \
  --document-id <uuid>
```

## Failure handling

Each stage emits a row to `processing_events` with `level` and `data`. Inspect:

```bash
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
  "SELECT created_at, stage, level, message FROM processing_events
   ORDER BY created_at DESC LIMIT 50;"
```

| Failure              | Where to look                                       | Fix                              |
|----------------------|-----------------------------------------------------|----------------------------------|
| `OCR_FAILED`         | `processing_events` → `OCR_STARTED` then error      | Confirm `tesseract-ocr-rus` package; rerun `scripts/reindex_document.py` |
| `EXTRACTION_FAILED`  | `processing_events` → Docling stage                 | Check timeout; verify Docling version pin |
| Indexing stuck       | OpenSearch + Qdrant health                          | `scripts/init_opensearch.py`, `scripts/init_qdrant.py` |
| Reranker disabled    | API logs → `reranker.disabled`                      | Ensure `RERANKER_ENABLED=true`; HF cache mounted |

## API authentication

Two mechanisms layered together:

1. **Reverse proxy / SSO** (preferred). Front the API with nginx, Traefik, or
   an OAuth gateway. The reverse proxy terminates TLS and authenticates the
   caller; LegacyHUB never sees a raw user identity.
2. **Shared-secret API key** (defence in depth). Set `API_KEY` to a long
   random value (`openssl rand -hex 32`). Every request to `APP_API_PREFIX`
   except `/health` must then carry either:

   ```http
   X-API-Key: <key>
   ```
   or:
   ```http
   Authorization: Bearer <key>
   ```

   `/health` is intentionally exempt so external probes do not need the
   secret.

   In production this is required (`docker-compose.prod.yml` fails the
   stack if `API_KEY` is empty). In development the key is optional and
   the default empty value disables the middleware entirely.

   The frontend reads `VITE_API_KEY` and injects the header on every Axios
   request. For SSO deployments leave `VITE_API_KEY` empty and let the
   reverse proxy inject the header server-side.

## Verification gates (per change)

1. `python -m pytest tests/ -q` — full unit suite (19+ tests).
2. `python -m compileall -q app scripts tests`.
3. `docker compose config --quiet`.
4. Frontend: `npx tsc --noEmit && npm run build`.
5. `/api/v1/health` returns `{"status":"ok"}`.
6. One smoke ingest of a known PDF; verify `/search` returns a result.

## Rollback

1. Capture deployed commit SHA before deploy (`git rev-parse HEAD`).
2. To roll back the API/worker image only:
   ```bash
   docker compose -f docker-compose.yml -f docker-compose.prod.yml \
     --env-file .env.prod up -d --build --force-recreate api worker \
     --no-deps  # keep PG/MinIO/OS/Qdrant intact
   ```
3. Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and
   should not be rolled back casually. Restore from backup via the standard
   TeamHUB Suite backup runbook.

## Reranker benchmark

The reranker is the latency-defining stage of the hybrid search path. Run the
benchmark on every hardware change (CPU vs GPU, instance type, batch size)
before promoting the configuration.

```bash
# synthetic warmup + 32 queries x 40 candidates, ~700-char passages
docker compose exec api python scripts/benchmark_reranker.py \
  --queries 32 --candidates 40 --warmup 4

# real corpus sample (after some documents are indexed)
docker compose exec api python scripts/benchmark_reranker.py \
  --source opensearch --query "ГОСТ 21.501-93" --candidates 40
```

Target SLOs (subject to revision once staging numbers land):

| Metric              | CPU target | GPU target |
|---------------------|-----------:|-----------:|
| p95 latency / query |  < 700 ms  |  < 120 ms  |
| Throughput          | > 60 pair/s | > 600 pair/s |

If the measured p95 exceeds the budget, options in order of preference:

1. Lower `RERANK_CANDIDATES` (default 40 — reducing to 20 roughly halves work).
2. Increase `RERANKER_BATCH_SIZE` (memory permitting).
3. Switch `RERANKER_DEVICE=cuda` and use a GPU-capable image.
4. Disable reranker (`RERANKER_ENABLED=false`) and accept raw RRF order — the
   API still returns useful results; the `reranked` field reports the truth.

Passages are clipped to 2048 chars before being fed to the cross-encoder so a
runaway chunk cannot starve the budget.

## Load testing

Two complementary harnesses live under `scripts/`:

### Ingest load

```bash
# Generate synthetic PDFs (~3 KB each, real PDF/1.4 with embedded text)
docker compose exec api python scripts/generate_synthetic_pdfs.py \
  --count 10000 --out /data/input/load

# Trigger ingest, sample status every 10 s, dump JSON history
docker compose exec api python scripts/load_ingest.py \
  --path /data/input/load \
  --api-url http://localhost:8000/api/v1 \
  --watch-seconds 1800 \
  --report-file /data/work/load_report.json
```

Target SLOs at the 70k-document scale (subject to refinement once measured):

| Metric                          | CPU target | GPU target |
|---------------------------------|-----------:|-----------:|
| Sustained throughput (docs/min) | > 30       | > 200      |
| Failure rate                    | < 1 %      | < 0.5 %    |
| p95 per-document wall time      | < 90 s     | < 25 s     |

### Search load

```bash
pip install locust  # one-time

locust -f scripts/locustfile_search.py \
       --host http://localhost:8000 \
       --headless --users 100 --spawn-rate 10 --run-time 10m \
       --html load_search.html
```

Target SLOs for hybrid mode with the reranker enabled:

| Percentile | CPU    | GPU   |
|------------|-------:|------:|
| p50        | 600 ms | 120 ms |
| p95        | 1500 ms | 300 ms |
| p99        | 3500 ms | 700 ms |

If staging numbers miss the budget, walk the reranker remediation ladder above
before chasing index sharding.

## Scaling notes (~70k PDFs)

- Workers horizontally scale: `docker compose up -d --scale worker=8`.
- Set `EMBEDDING_DEVICE=cuda` on a GPU-capable worker image for ~10× embedding
  throughput.
- OpenSearch single shard suffices to ~10M chunks; increase shards and add
  replicas in prod.
- Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.

## Common one-liners

```bash
# count indexed chunks in OpenSearch
curl 'http://localhost:9200/legacy_chunks/_count' | jq .

# inspect Qdrant collection
curl 'http://localhost:6333/collections/legacy_chunks' | jq .

# list MinIO buckets
docker compose exec minio mc alias set local http://localhost:9000 \
  "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
docker compose exec minio mc ls local

# how many docs reached INDEXING_COMPLETED
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
  "SELECT status, COUNT(*) FROM documents GROUP BY status;"
```