📚 FOMO Search — Wiki

Reference definitions for working with this search system.

Document scan statuses

Every document in the corpus has one of these statuses, visible at /completeness. The status has two independent dimensions: technical (is the document in the search DB?) and analytical (has it been explicitly read and documented?).

Status In DB? Explicitly analysed? Meaning Cite in Q&A?
✓ Indexed Yes Yes Explicitly read; key findings captured in DOC-INDEX.md. Sections, IDs, and findings are understood and documented. Yes — cite with confidence
↻ Loaded Yes No Chunks in DB, fully searchable. Nobody has explicitly read it to capture what's in it. May surface in search results. Verify first — read the result before citing
~ Partial Sometimes Partly Structure known but content not fully extractable (e.g. diagram-only PDFs, DOCX where only structure was read). With caution — gaps exist
○ Unread No No File exists on disk but has no chunks in the DB. Not yet processed. No
✗ Blocked No No Cannot extract text (PPTX, diagram-only, corrupt). Will never be indexed until a new extraction method is added. No
— Excluded No No Deliberately excluded from the corpus. Commercial/pricing, HR/roles, or procedural documents that generate search noise without adding architectural value. Controlled via EXCLUDED_PREFIXES and SKIP_SHEETS in ingest.py. No

What "Indexed" means in practice

Promoting a document from Loaded to Indexed requires three things:

  1. The document has been fully read (all sections, not just skimmed).
  2. Key findings — AR/BR/TOM references, architectural decisions, gaps, ownership assignments — have been captured in .md/DOC-INDEX.md in the document's detail section.
  3. The document's entry in doc_manifest.py has been updated to "status": "INDEXED" and the container has been rebuilt.

A document that is only Loaded is searchable but its results must be verified before being cited in a Q&A conclusion. An Indexed document can be cited directly.

Q&A verdict types

Each Q&A analysis concludes with one of these verdicts, based on comparing the RFP requirement against the BAFO response. Verdicts drive the recommended documentation action.

Verdict Documentation form Meaning
COMPLIANT Factual statement DXC's proposed solution meets the requirement as stated in the RFP. No action required beyond recording the finding.
NON_COMPLIANT Change request candidate DXC's solution does not meet the requirement. The gap must be resolved — either DXC adapts the solution or the requirement is formally changed.
INCONSISTENCY Clarification request / escalation Two or more requirements or documents contradict each other, or the BAFO response conflicts with the RFP. Both sides identified, conflict described. Cannot be resolved without a decision.
RECOMMENDATION Clarification request The requirement as written should be revised — the current wording is ambiguous, overly broad, or technically impractical. A proposed change is included.
INSUFFICIENT_EVIDENCE Clarification request The available indexed documents do not contain enough evidence to reach a conclusion. Additional documents must be indexed or a direct question posed to the parties before a verdict can be given.

Q&A status lifecycle

Each Q&A entry moves through this lifecycle. Status is updated manually via the detail page.

StatusMeaningTypical next action
DRAFT Claude analysis complete, not yet reviewed by the analyst. Review evidence + conclusion → mark Concluded
CONCLUDED Analyst has reviewed and agrees with the conclusion. Share with client → mark Client Review
CLIENT_REVIEW Sent to client (AO/DFB) for review and sign-off. Client approves or rejects
APPROVED Client has agreed the conclusion. Becomes part of the project record. Reference in design documents
REJECTED Client disagrees. Conclusion must be revised or escalated. Revise → re-submit → new entry

Cross-reference types

The cross_refs JSONB column on each chunk links a BAFO answer to the RFP question(s) it explicitly addresses. Two directions are surfaced in search results:

DirectionUI labelColourMeaning
BAFO → RFP 🔍 Beantwoordt vraag Blue panel This BAFO chunk was written to answer the linked RFP question/challenge.
RFP → BAFO ✅ BAFO antwoord Green panel This RFP chunk has a BAFO answer — click to expand the DXC response.

Cross-refs are set at ingest time and are explicit, not inferred. A result without a cross-ref panel is not necessarily unanswered — the answer may exist but not yet be wired. See CUSTOM_INGESTS in ingest.py for the current wiring.

Q&A process — how it works

When you submit a question, the system runs the following steps in order. Claude is called exactly once per submission — all retrieval happens first, then a single prompt is sent with the full evidence cluster.

Step 1 — Code detection (0 DB calls)
Regex scans the question text for document codes: AR NNN, BR NNNN, T18, TOM T02, P6, DP2 etc. Auto-detected codes are added to the retrieval queue alongside any extra terms you typed in the "Additional search terms" field.

Step 2 — Main search + AR supplement (2 DB calls)
The question is embedded using fastembed (paraphrase-multilingual-MiniLM-L12-v2, 384 dims). Two RRF queries run in parallel:
  • Full corpus search: BM25 + pgvector cosine across all indexed documents → top 15 passages
  • AR supplement: same query filtered to DXC_3_architecturale vereisten.xlsx (B-03) → top 5 most relevant architectural requirements
The AR supplement runs for every Q&A call, so Claude always sees the most relevant ARs even when the question uses domain vocabulary that doesn't literally appear in the AR text.

Step 3 — Targeted lookups (N DB calls)
For each detected or explicit code term (e.g. AR 069):
  • Digit-padding normalisation: AR 0069 → tries AR 0069, AR 069, AR 69
  • Direct WHERE UPPER(row_ref) IN (...) query — bypasses BM25/vector entirely
  • Guarantees the exact row is found regardless of semantic distance
  • Up to 5 rows per code term. Free-text extra terms use RRF (top 5 each).

Step 4 — Deduplication (in memory)
All rows from steps 2 and 3 are merged. Duplicates by (source_file, row_ref) are removed, keeping the highest score. Result sorted by score descending.

Step 5 — Cross-reference resolution (≤ 2 DB calls per unique passage)
For each unique passage in the cluster:
  • Forward (BAFO → RFP): if the chunk has cross_refs set, fetch the linked RFP question. Example: B-04 uitdaging-10 → R-17 Top 10 uitdagingen pain point.
  • Reverse (RFP → BAFO): check if any BAFO chunk points back to this RFP chunk. Example: R-17 uitdaging-10 ← B-04 data ontsluiting solution.
Results cached in memory within the request — each unique key resolved once.

Step 6 — Prompt assembly (in memory)
Passages + cross-refs are formatted into a compact numbered list (content capped at 600 chars, cross-refs at 300 chars). The prompt includes the FOMO project context, party definitions, and instructions to return a structured JSON with stages 1–5.

Step 7 — Single Claude API call (1 external call)
One request to claude-sonnet-4-6 with the assembled prompt. Claude returns a JSON object with:
  • Stage 1: distilled question + classification + challenges
  • Stage 2: evidence table (RFP vs BAFO, Dutch quote + English interpretation) + gaps
  • Stage 3: assumptions, related topics, Oracle platform notes
  • Stage 5: verdict + answer + documentation form + key artefact IDs
Typical duration: 10–20 seconds. Cost: ~€0.05 per call.

Step 8 — Storage (1 DB call)
Result stored in qa_entries table: question, passages JSONB, analysis JSONB, extra_terms JSONB, status=DRAFT. Redirects to the detail page.
WhatCountNotes
DB calls — main RRF search1Always — full corpus, top 15
DB calls — AR supplement (B-03)1Always — top 5 most relevant architectural requirements
DB calls — code lookups0 – NOne per auto-detected or explicit code
DB calls — free-text extra searches0 – MOne per extra term (RRF, top 5)
DB calls — cross-ref resolution≤ 2 × PP = unique passages; cached per key
DB calls — store result1INSERT into qa_entries
Claude API calls1Always exactly one, never streaming
Embedding calls1 + Mfastembed, runs in-process (no network)

Why one Claude call? All retrieval (search, lookup, cross-ref resolution) completes first. Claude receives the full evidence cluster in a single prompt and performs the entire staged analysis in one response. This keeps costs predictable (€0.04–0.06 per question), avoids iterative API round-trips, and makes the process auditable — the exact passages Claude saw are stored alongside the analysis.

Each call is stateless — from scratch

Claude receives no conversation history and no previous Q&A entries. Every submission starts fresh with only the current question and the passages retrieved for it.

What Claude seesWhat Claude does NOT see
✓ FOMO project context (system prompt)
✓ Current question
✓ Retrieved passages + cross-refs
✓ Auto-detected code rows (AR, T, BR…)
✗ Previous Q&A entries and their verdicts
✗ Other questions asked today or before
✗ The analyst's comments or status updates
✗ Any conversation history

Consequence: if you ask a follow-up question related to an earlier entry (e.g. "given ARC-002's conclusion, does AR 069 change the VLABEL scope picture?"), Claude will not know about ARC-002 unless you paste the relevant conclusion into the question or the extra terms field. This is intentional — each analysis is independent and unbiased by prior conclusions. Cross-referencing between Q&A entries is currently a manual step.

This is a deliberate design principle, not a limitation. Including previous Q&A analyses in the prompt risks anchoring Claude on earlier conclusions — including wrong ones — and propagating hallucinations across related questions. Each verdict must be grounded solely in the indexed corpus evidence, not in prior outputs. Cross-referencing between entries is the analyst's responsibility, not the model's.

Corpus exclusion rules

Two mechanisms keep noise out of the search index:

To add an exclusion: update ingest.py, run a targeted DELETE in psql, then deploy.sh db.