- scripts/benchmark_reranker.py exercises the configured reranker
with synthetic queries or live OpenSearch samples and prints
p50/p95/p99 latency, mean latency, and pairs/sec throughput.
Supports --warmup, --candidates, --passage-length, --source, and a
--json-only mode for CI.
- app/indexing/reranker.py clips passages to 2048 characters before
scoring so a runaway chunk cannot starve the cross-encoder beyond
bge-reranker-v2-m3's training window.
- RUNBOOK.md gains a Reranker benchmark section with CPU/GPU SLO
targets and a remediation ladder (lower top-K, raise batch size,
switch device, disable reranker) when measured p95 exceeds budget.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- app/indexing/qdrant_client.py: remove the identity-only _qid()
helper and pass chunk_id straight to PointStruct (Qdrant accepts
the UUID string directly).
- services/types.ts: SearchHit gets an explicit, optional
ocr_confidence field so consumers can type the value instead of
casting through metadata.
- widgets/SearchResultCard.tsx: replaces the
(hit.metadata as { ocr_confidence? }) cast with the new field. No
behavior change when the backend omits it.
tsc --noEmit: clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>