The artifact-upsert helper was duplicated four times (scanner.py,
table_processor.py, figure_processor.py, pipeline.py) with slightly
different signatures. Consolidates into a single keyword-only function
keyed on (document_id, storage_key) - the identity the schema already
enforces - so re-running the pipeline never creates duplicate rows.
scanner / table_processor / figure_processor now import the shared
helper directly. pipeline.py keeps a thin local wrapper to preserve
the positional call sites at three artifact upsert points (OCR_PDF,
MARKDOWN, DOCLING_JSON).
Tests: 24 passed (5 health + 19 original).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CORS:
- New setting CORS_ALLOWED_ORIGINS (comma separated). Defaults cover
the three local Vite ports (5173, 5273, 4173); production overlay
expects the real origin in .env.prod.
- main.py wires CORSMiddleware from settings.cors_origins. No * in
production - see RUNBOOK and .env.prod.example.
- docker-compose.yml forwards the variable to both api and worker.
Tests:
- tests/test_api_health.py uses FastAPI TestClient and monkeypatches
the five probe functions (postgres/minio/opensearch/qdrant/redis).
Verifies the all-ok, any-error, and degraded paths, that the root
endpoint reports the configured api prefix, and that the CORS
preflight echoes the allowed origin.
- pytest tests/test_api_health.py -q: 5 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>