LegacyHUB

Author	SHA1	Message	Date
Vadim Malanov	272cbf6d92	feat(dispatch): consume QMS asset events from the service inbox (C2) Some checks failed CI / Backend (lint + tests + compose) (push) Has been cancelled Details CI / Frontend (lint + type-check + build) (push) Has been cancelled Details LegacyHUB is subscribed to AssetRegistered/AssetApproved from QMS but never polled its service inbox. New consumer maps the §7 payload to an AssetManifestEnvelope and runs the local knowledge-ingest, then confirms the delivery. Outcome policy avoids head-of-line blocking: Registered → ingest (accepted/duplicate/rejected all confirm), Approved/unknown → confirm without work, malformed → confirm as poison; only transient infrastructure failures (object_read_failed) leave the message pending and stop the run for the next scheduled retry. Thin CLI wrapper scripts/consume_dispatch_inbox.py (scripts/ ships in the image). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-12 12:39:44 +03:00
Vadim Malanov	79d5fa9fee	chore(guardrail): report-only size check and var/ compatibility note Add scripts/report_module_sizes.py - report-only LOC bands per layer from the TeamHUB module-size guardrail (CONVENTIONS.md, module contract p.6). Always exits 0: split-review stays a human decision. Wire it into CI as a non-blocking step and into the AGENTS.md baseline checks. Document data/input + data/work as the deliberate compatibility exception to the var/ rule of 18_REPO_LAYOUT_STANDARD.md in docs/structure-map.md (compose mounts and APP_*_DIR depend on the paths) and ignore var/ for new local runtime output. Verified: script runs clean on the repo (159 files in scope, zero flagged); ruff and compileall pass; adversarial review fixed two classification bugs (frontend shell under frontend/src/app/, app/api/security.py as infra helper). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-07 10:19:08 +03:00
Vadim Malanov	d27dd0ffbb	feat: align LegacyHUB with TeamHUB platform contract (D2/D4, assets, security) Close 12 audit-driven platform-compliance gaps on a single branch. - D4 dispatch: app/integrations/dispatch_client.py participant `legacyhub`, emits LegacyhubDocumentIndexed + AssetDerivativeReady after the indexing commit (idempotent uuid5), http_inbox route (reindex/tombstone) with audit-based dedupe; docs/dispatch-contract.md. Celery+Redis stays intra-module. - D2 SSO: app/integrations/identity.py validates X-TeamHub-* + role/scope mapper; security.py adds trusted-header enforcement (AUTH_REQUIRE_IDENTITY) and a scope check on /search; docker-compose.teamhub.yml (external teamhub_net + internal db net, api not host-published); RUNBOOK network/firewall section. - Asset standard: SearchHit/Citation carry asset_id/owner_module; buckets renamed teamhub-legacyhub-* (+quarantine/tmp/exports); purge-by-asset_id with legal-hold guard (app/indexing/projection.py); OCR-markdown derivative event. - audit_log model + Alembic 0003 + record_audit on writes (same transaction). - Secret masking: app/common/json_logger.py recursive mask wired into structlog (+ensure_ascii=False); event payloads redacted before persistence. - Service X-API-Key mandatory on ingest endpoints (defence-in-depth). - Port: host API 8000->8050 (collision with SalesHUB/MailHUB resolved), container still listens on 8000. - Config: no plaintext secret defaults; fail-loud in non-dev (no value leak). - Docs drift: README PG 5440, layered-auth note, 5173 removed from CORS; ingest/folder gated by ENABLE_FOLDER_INGEST (410 by default). - ADRs: layers mapping, shared-core extraction, UI locale (RU-first). Tests: 78 passing (ruff, compileall, pytest, tsc, vite build, compose config). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 11:44:15 +03:00
Vadim Malanov	2e76554188	ci: align backend lint checks Some checks failed CI / Backend (lint + tests + compose) (push) Has been cancelled Details CI / Frontend (lint + type-check + build) (push) Has been cancelled Details	2026-06-14 20:14:49 +03:00
Vadim Malanov	2806aaf441	fix(ingestion): materialize scanner generator before inline processing The folder scanner commits each document's row inside a per-document session_scope that only finalizes when the generator advances past its yield. In inline mode scripts/ingest_folder.py invoked the pipeline (which opens a fresh session) before that commit landed, so process_document_id saw "document_missing" and the pipeline aborted at STORED_ORIGINAL. Draining the generator with list() forces every scanner commit before any pipeline run. Verified end to end: a generated PDF now progresses DISCOVERED -> ... -> CHUNKING_COMPLETED and is searchable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 13:41:53 +03:00
Vadim Malanov	a97d0bbcfd	perf: add ingest and search load-test harnesses scripts/generate_synthetic_pdfs.py builds real PDF/1.4 documents with a hand-written xref so we can generate tens of thousands of ~2 KB PDFs locally. Helvetica only covers latin-1, which is fine for a load generator (throughput, not retrieval relevance); the docstring calls this out so no one mistakes the output for a quality corpus. scripts/load_ingest.py drives POST /ingest/folder, then polls a hypothetical /documents/stats endpoint every poll-interval seconds to track terminal-state progression. Writes a JSON history report so results can be diffed between runs. scripts/locustfile_search.py defines a SearchUser profile mixing hybrid / lexical / semantic queries against POST /search plus a health-check sampler. Asserts non-empty results so a "200 with zero hits" regression surfaces as a failure rather than a green percentile graph. RUNBOOK gains a Load testing section with CPU/GPU SLO tables for both axes (sustained docs/min, search latency p50/p95/p99). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:11:08 +03:00
Vadim Malanov	349f4ea838	perf(reranker): add benchmark harness and passage clipping - scripts/benchmark_reranker.py exercises the configured reranker with synthetic queries or live OpenSearch samples and prints p50/p95/p99 latency, mean latency, and pairs/sec throughput. Supports --warmup, --candidates, --passage-length, --source, and a --json-only mode for CI. - app/indexing/reranker.py clips passages to 2048 characters before scoring so a runaway chunk cannot starve the cross-encoder beyond bge-reranker-v2-m3's training window. - RUNBOOK.md gains a Reranker benchmark section with CPU/GPU SLO targets and a remediation ladder (lower top-K, raise batch size, switch device, disable reranker) when measured p95 exceeds budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:08:04 +03:00
Vadim Malanov	7f72171572	chore: bootstrap repository with governance docs Initialize git, add Apache-2.0 LICENSE, .gitattributes (LF line endings), AGENTS.md (entry points, stack, discovery order, baseline checks), RUNBOOK.md (dev boot, prod deploy with overlay, ingestion, failures, rollback, scaling notes), .env.prod.example with rotated credential placeholders, and dev-only warnings on .env.example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 16:41:50 +03:00

8 Commits