Reference definitions for working with this search system.
Every document in the corpus has one of these statuses, visible at /completeness. The status has two independent dimensions: technical (is the document in the search DB?) and analytical (has it been explicitly read and documented?).
| Status | In DB? | Explicitly analysed? | Meaning | Cite in Q&A? |
|---|---|---|---|---|
| ✓ Indexed | Yes | Yes | Explicitly read; key findings captured in DOC-INDEX.md. Sections, IDs, and findings are understood and documented. | Yes — cite with confidence |
| ↻ Loaded | Yes | No | Chunks in DB, fully searchable. Nobody has explicitly read it to capture what's in it. May surface in search results. | Verify first — read the result before citing |
| ~ Partial | Sometimes | Partly | Structure known but content not fully extractable (e.g. diagram-only PDFs, DOCX where only structure was read). | With caution — gaps exist |
| ○ Unread | No | No | File exists on disk but has no chunks in the DB. Not yet processed. | No |
| ✗ Blocked | No | No | Cannot extract text (PPTX, diagram-only, corrupt). Will never be indexed until a new extraction method is added. | No |
| — Excluded | No | No | Deliberately excluded from the corpus. Commercial/pricing, HR/roles,
or procedural documents that generate search noise without adding
architectural value. Controlled via EXCLUDED_PREFIXES
and SKIP_SHEETS in ingest.py. |
No |
Promoting a document from Loaded to Indexed requires three things:
.md/DOC-INDEX.md
in the document's detail section.doc_manifest.py has been updated
to "status": "INDEXED" and the container has been rebuilt.A document that is only Loaded is searchable but its results must be verified before being cited in a Q&A conclusion. An Indexed document can be cited directly.
Each Q&A analysis concludes with one of these verdicts, based on comparing the RFP requirement against the BAFO response. Verdicts drive the recommended documentation action.
| Verdict | Documentation form | Meaning |
|---|---|---|
| COMPLIANT | Factual statement | DXC's proposed solution meets the requirement as stated in the RFP. No action required beyond recording the finding. |
| NON_COMPLIANT | Change request candidate | DXC's solution does not meet the requirement. The gap must be resolved — either DXC adapts the solution or the requirement is formally changed. |
| INCONSISTENCY | Clarification request / escalation | Two or more requirements or documents contradict each other, or the BAFO response conflicts with the RFP. Both sides identified, conflict described. Cannot be resolved without a decision. |
| RECOMMENDATION | Clarification request | The requirement as written should be revised — the current wording is ambiguous, overly broad, or technically impractical. A proposed change is included. |
| INSUFFICIENT_EVIDENCE | Clarification request | The available indexed documents do not contain enough evidence to reach a conclusion. Additional documents must be indexed or a direct question posed to the parties before a verdict can be given. |
Each Q&A entry moves through this lifecycle. Status is updated manually via the detail page.
| Status | Meaning | Typical next action |
|---|---|---|
| DRAFT | Claude analysis complete, not yet reviewed by the analyst. | Review evidence + conclusion → mark Concluded |
| CONCLUDED | Analyst has reviewed and agrees with the conclusion. | Share with client → mark Client Review |
| CLIENT_REVIEW | Sent to client (AO/DFB) for review and sign-off. | Client approves or rejects |
| APPROVED | Client has agreed the conclusion. Becomes part of the project record. | Reference in design documents |
| REJECTED | Client disagrees. Conclusion must be revised or escalated. | Revise → re-submit → new entry |
The cross_refs JSONB column on each chunk links a BAFO answer to the
RFP question(s) it explicitly addresses. Two directions are surfaced in search results:
| Direction | UI label | Colour | Meaning |
|---|---|---|---|
| BAFO → RFP | 🔍 Beantwoordt vraag | Blue panel | This BAFO chunk was written to answer the linked RFP question/challenge. |
| RFP → BAFO | ✅ BAFO antwoord | Green panel | This RFP chunk has a BAFO answer — click to expand the DXC response. |
Cross-refs are set at ingest time and are explicit, not inferred.
A result without a cross-ref panel is not necessarily unanswered —
the answer may exist but not yet be wired. See
CUSTOM_INGESTS in ingest.py for the current wiring.
When you submit a question, the system runs the following steps in order. Claude is called exactly once per submission — all retrieval happens first, then a single prompt is sent with the full evidence cluster.
AR NNN, BR NNNN,
T18, TOM T02, P6, DP2 etc.
Auto-detected codes are added to the retrieval queue alongside any extra terms
you typed in the "Additional search terms" field.paraphrase-multilingual-MiniLM-L12-v2, 384 dims).
Two RRF queries run in parallel:DXC_3_architecturale vereisten.xlsx (B-03) → top 5 most relevant architectural requirementsAR 069):AR 0069 → tries AR 0069, AR 069, AR 69WHERE UPPER(row_ref) IN (...) query — bypasses BM25/vector entirely(source_file, row_ref)
are removed, keeping the highest score. Result sorted by score descending.cross_refs set, fetch the linked RFP question.
Example: B-04 uitdaging-10 → R-17 Top 10 uitdagingen pain point.claude-sonnet-4-6 with the assembled prompt.
Claude returns a JSON object with:qa_entries table: question, passages JSONB, analysis JSONB,
extra_terms JSONB, status=DRAFT. Redirects to the detail page.| What | Count | Notes |
|---|---|---|
| DB calls — main RRF search | 1 | Always — full corpus, top 15 |
| DB calls — AR supplement (B-03) | 1 | Always — top 5 most relevant architectural requirements |
| DB calls — code lookups | 0 – N | One per auto-detected or explicit code |
| DB calls — free-text extra searches | 0 – M | One per extra term (RRF, top 5) |
| DB calls — cross-ref resolution | ≤ 2 × P | P = unique passages; cached per key |
| DB calls — store result | 1 | INSERT into qa_entries |
| Claude API calls | 1 | Always exactly one, never streaming |
| Embedding calls | 1 + M | fastembed, runs in-process (no network) |
Why one Claude call? All retrieval (search, lookup, cross-ref resolution) completes first. Claude receives the full evidence cluster in a single prompt and performs the entire staged analysis in one response. This keeps costs predictable (€0.04–0.06 per question), avoids iterative API round-trips, and makes the process auditable — the exact passages Claude saw are stored alongside the analysis.
Claude receives no conversation history and no previous Q&A entries. Every submission starts fresh with only the current question and the passages retrieved for it.
| What Claude sees | What Claude does NOT see |
|---|---|
| ✓ FOMO project context (system prompt) ✓ Current question ✓ Retrieved passages + cross-refs ✓ Auto-detected code rows (AR, T, BR…) |
✗ Previous Q&A entries and their verdicts ✗ Other questions asked today or before ✗ The analyst's comments or status updates ✗ Any conversation history |
Consequence: if you ask a follow-up question related to an earlier entry
(e.g. "given ARC-002's conclusion, does AR 069 change the VLABEL scope picture?"),
Claude will not know about ARC-002 unless you paste the relevant conclusion into the question
or the extra terms field. This is intentional — each analysis is independent and unbiased
by prior conclusions. Cross-referencing between Q&A entries is currently a manual step.
This is a deliberate design principle, not a limitation.
Including previous Q&A analyses in the prompt risks anchoring Claude on earlier conclusions —
including wrong ones — and propagating hallucinations across related questions.
Each verdict must be grounded solely in the indexed corpus evidence, not in prior outputs.
Cross-referencing between entries is the analyst's responsibility, not the model's.
Two mechanisms keep noise out of the search index:
EXCLUDED_PREFIXES in ingest.py —
file paths or folder prefixes that are skipped entirely at ingest time.
Used for commercial/pricing docs, HR/roles docs, and procedural documents.SKIP_SHEETS in ingest.py —
XLSX sheet names skipped across all files.
Currently: Rollen, Instructies.To add an exclusion: update ingest.py, run a targeted
DELETE in psql, then deploy.sh db.