Files
LegacyHUB/RUNBOOK.md
Vadim Malanov 7f72171572 chore: bootstrap repository with governance docs
Initialize git, add Apache-2.0 LICENSE, .gitattributes (LF line
endings), AGENTS.md (entry points, stack, discovery order, baseline
checks), RUNBOOK.md (dev boot, prod deploy with overlay, ingestion,
failures, rollback, scaling notes), .env.prod.example with rotated
credential placeholders, and dev-only warnings on .env.example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 16:41:50 +03:00

5.1 KiB
Raw Blame History

LegacyHUB — Operational Runbook

Quick boot (dev)

cp .env.example .env
docker compose up -d --build
docker compose exec api python scripts/init_db.py
docker compose exec api python scripts/init_opensearch.py
docker compose exec api python scripts/init_qdrant.py
docker compose exec api python scripts/smoke_test.py

Verify:

curl -fsS http://localhost:8000/api/v1/health | jq .

Frontend dev:

cd frontend && cp .env.example .env && npm install && npm run dev
# http://localhost:5273

Production deploy

Production overlay enables OpenSearch security plugin, removes default ports, forces externally-supplied credentials, and disables debug routes.

# 1. Ensure secrets exist
cp .env.prod.example .env.prod
$EDITOR .env.prod          # rotate every credential, never commit

# 2. Build + recreate
docker compose \
  -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod \
  up -d --build --force-recreate api worker

# 3. Migrations
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod exec api python scripts/init_db.py

# 4. Health gate
docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  --env-file .env.prod exec api python scripts/smoke_test.py
curl -fsS https://<host>/api/v1/health | jq -e '.status == "ok"'

Hardening notes (mandatory for prod):

  • Rotate every credential in .env.prod from .env.prod.example placeholders.
  • Put OpenSearch behind TLS and admin password. Remove DISABLE_SECURITY_PLUGIN=true (handled by overlay).
  • Front the API with a reverse proxy that performs auth + TLS termination.
  • Restrict CORS via CORS_ALLOWED_ORIGINS (comma-separated) — never * in prod.
  • MinIO root key/secret in prod must come from a secret store, not the repo.
  • Mount data/input and data/work from durable storage, not the workstation.

Ingestion

# trigger from the API
curl -X POST http://localhost:8000/api/v1/ingest/folder \
  -H "Content-Type: application/json" \
  -d '{"path":"/data/input","recursive":true,"force":false}'

# or inline (no Celery)
docker compose exec api python scripts/ingest_folder.py \
  --path /data/input --recursive --mode inline

# re-index a single doc
docker compose exec api python scripts/reindex_document.py \
  --document-id <uuid>

Failure handling

Each stage emits a row to processing_events with level and data. Inspect:

docker compose exec postgres psql -U legacyhub -d legacyhub -c \
  "SELECT created_at, stage, level, message FROM processing_events
   ORDER BY created_at DESC LIMIT 50;"
Failure Where to look Fix
OCR_FAILED processing_eventsOCR_STARTED then error Confirm tesseract-ocr-rus package; rerun scripts/reindex_document.py
EXTRACTION_FAILED processing_events → Docling stage Check timeout; verify Docling version pin
Indexing stuck OpenSearch + Qdrant health scripts/init_opensearch.py, scripts/init_qdrant.py
Reranker disabled API logs → reranker.disabled Ensure RERANKER_ENABLED=true; HF cache mounted

Verification gates (per change)

  1. python -m pytest tests/ -q — full unit suite (19+ tests).
  2. python -m compileall -q app scripts tests.
  3. docker compose config --quiet.
  4. Frontend: npx tsc --noEmit && npm run build.
  5. /api/v1/health returns {"status":"ok"}.
  6. One smoke ingest of a known PDF; verify /search returns a result.

Rollback

  1. Capture deployed commit SHA before deploy (git rev-parse HEAD).
  2. To roll back the API/worker image only:
    docker compose -f docker-compose.yml -f docker-compose.prod.yml \
      --env-file .env.prod up -d --build --force-recreate api worker \
      --no-deps  # keep PG/MinIO/OS/Qdrant intact
    
  3. Data services (PostgreSQL, MinIO, OpenSearch, Qdrant) are stateful and should not be rolled back casually. Restore from backup via the standard TeamHUB Suite backup runbook.

Scaling notes (~70k PDFs)

  • Workers horizontally scale: docker compose up -d --scale worker=8.
  • Set EMBEDDING_DEVICE=cuda on a GPU-capable worker image for ~10× embedding throughput.
  • OpenSearch single shard suffices to ~10M chunks; increase shards and add replicas in prod.
  • Qdrant single-node OK for ~5M vectors; switch to cluster build beyond that.

Common one-liners

# count indexed chunks in OpenSearch
curl 'http://localhost:9200/legacy_chunks/_count' | jq .

# inspect Qdrant collection
curl 'http://localhost:6333/collections/legacy_chunks' | jq .

# list MinIO buckets
docker compose exec minio mc alias set local http://localhost:9000 \
  "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
docker compose exec minio mc ls local

# how many docs reached INDEXING_COMPLETED
docker compose exec postgres psql -U legacyhub -d legacyhub -c \
  "SELECT status, COUNT(*) FROM documents GROUP BY status;"