FOMO Wiki

📚 FOMO Search — Wiki

Reference definitions for working with this search system.

Document scan statuses

Every document in the corpus has one of these statuses, visible at /completeness. The status has two independent dimensions: technical (is the document in the search DB?) and analytical (has it been explicitly read and documented?).

Status	In DB?	Explicitly analysed?	Meaning	Cite in Q&A?
✓ Indexed	Yes	Yes	Explicitly read; key findings captured in DOC-INDEX.md. Sections, IDs, and findings are understood and documented.	Yes — cite with confidence
↻ Loaded	Yes	No	Chunks in DB, fully searchable. Nobody has explicitly read it to capture what's in it. May surface in search results.	Verify first — read the result before citing
~ Partial	Sometimes	Partly	Structure known but content not fully extractable (e.g. diagram-only PDFs, DOCX where only structure was read).	With caution — gaps exist
○ Unread	No	No	File exists on disk but has no chunks in the DB. Not yet processed.	No
✗ Blocked	No	No	Cannot extract text (PPTX, diagram-only, corrupt). Will never be indexed until a new extraction method is added.	No
— Excluded	No	No	Deliberately excluded from the corpus. Commercial/pricing, HR/roles, or procedural documents that generate search noise without adding architectural value. Controlled via `EXCLUDED_PREFIXES` and `SKIP_SHEETS` in `ingest.py`.	No

What "Indexed" means in practice

Promoting a document from Loaded to Indexed requires three things:

The document has been fully read (all sections, not just skimmed).
Key findings — AR/BR/TOM references, architectural decisions, gaps, ownership assignments — have been captured in .md/DOC-INDEX.md in the document's detail section.
The document's entry in doc_manifest.py has been updated to "status": "INDEXED" and the container has been rebuilt.

A document that is only Loaded is searchable but its results must be verified before being cited in a Q&A conclusion. An Indexed document can be cited directly.

Q&A verdict types

Each Q&A analysis concludes with one of these verdicts, based on comparing the RFP requirement against the BAFO response. Verdicts drive the recommended documentation action.

Verdict	Documentation form	Meaning
COMPLIANT	Factual statement	DXC's proposed solution meets the requirement as stated in the RFP. No action required beyond recording the finding.
NON_COMPLIANT	Change request candidate	DXC's solution does not meet the requirement. The gap must be resolved — either DXC adapts the solution or the requirement is formally changed.
INCONSISTENCY	Clarification request / escalation	Two or more requirements or documents contradict each other, or the BAFO response conflicts with the RFP. Both sides identified, conflict described. Cannot be resolved without a decision.
RECOMMENDATION	Clarification request	The requirement as written should be revised — the current wording is ambiguous, overly broad, or technically impractical. A proposed change is included.
INSUFFICIENT_EVIDENCE	Clarification request	The available indexed documents do not contain enough evidence to reach a conclusion. Additional documents must be indexed or a direct question posed to the parties before a verdict can be given.

Q&A status lifecycle

Each Q&A entry moves through this lifecycle. Status is updated manually via the detail page.

Status	Meaning	Typical next action
DRAFT	Claude analysis complete, not yet reviewed by the analyst.	Review evidence + conclusion → mark Concluded
CONCLUDED	Analyst has reviewed and agrees with the conclusion.	Share with client → mark Client Review
CLIENT_REVIEW	Sent to client (AO/DFB) for review and sign-off.	Client approves or rejects
APPROVED	Client has agreed the conclusion. Becomes part of the project record.	Reference in design documents
REJECTED	Client disagrees. Conclusion must be revised or escalated.	Revise → re-submit → new entry

Cross-reference types

The cross_refs JSONB column on each chunk links a BAFO answer to the RFP question(s) it explicitly addresses. Two directions are surfaced in search results:

Direction	UI label	Colour	Meaning
BAFO → RFP	🔍 Beantwoordt vraag	Blue panel	This BAFO chunk was written to answer the linked RFP question/challenge.
RFP → BAFO	✅ BAFO antwoord	Green panel	This RFP chunk has a BAFO answer — click to expand the DXC response.

Cross-refs are set at ingest time and are explicit, not inferred. A result without a cross-ref panel is not necessarily unanswered — the answer may exist but not yet be wired. See CUSTOM_INGESTS in ingest.py for the current wiring.

Q&A process — how it works

When you submit a question, the system runs the following steps in order. Claude is called exactly once per submission — all retrieval happens first, then a single prompt is sent with the full evidence cluster.

Step 1 — Code detection (0 DB calls)

Regex scans the question text for document codes: AR NNN, BR NNNN,
T18, TOM T02, P6, DP2 etc.
Auto-detected codes are added to the retrieval queue alongside any extra terms
you typed in the "Additional search terms" field.

Step 2 — Main search + AR supplement (2 DB calls)

The question is embedded using fastembed (paraphrase-multilingual-MiniLM-L12-v2, 384 dims).
Two RRF queries run in parallel:

  • Full corpus search: BM25 + pgvector cosine across all indexed documents → top 15 passages

  • AR supplement: same query filtered to DXC_3_architecturale vereisten.xlsx (B-03) → top 5 most relevant architectural requirements

The AR supplement runs for every Q&A call, so Claude always sees the most relevant ARs
even when the question uses domain vocabulary that doesn't literally appear in the AR text.

Step 3 — Targeted lookups (N DB calls)

For each detected or explicit code term (e.g. AR 069):

  • Digit-padding normalisation: AR 0069 → tries AR 0069, AR 069, AR 69

  • Direct WHERE UPPER(row_ref) IN (...) query — bypasses BM25/vector entirely

  • Guarantees the exact row is found regardless of semantic distance

  • Up to 5 rows per code term. Free-text extra terms use RRF (top 5 each).

Step 4 — Deduplication (in memory)

All rows from steps 2 and 3 are merged. Duplicates by (source_file, row_ref)
are removed, keeping the highest score. Result sorted by score descending.

Step 5 — Cross-reference resolution (≤ 2 DB calls per unique passage)

For each unique passage in the cluster:

  • Forward (BAFO → RFP): if the chunk has cross_refs set, fetch the linked RFP question.
Example: B-04 uitdaging-10 → R-17 Top 10 uitdagingen pain point.

  • Reverse (RFP → BAFO): check if any BAFO chunk points back to this RFP chunk.
Example: R-17 uitdaging-10 ← B-04 data ontsluiting solution.

Results cached in memory within the request — each unique key resolved once.

Step 6 — Prompt assembly (in memory)

Passages + cross-refs are formatted into a compact numbered list (content capped at 600 chars,
cross-refs at 300 chars). The prompt includes the FOMO project context, party definitions,
and instructions to return a structured JSON with stages 1–5.

Step 7 — Single Claude API call (1 external call)

One request to claude-sonnet-4-6 with the assembled prompt.
Claude returns a JSON object with:

  • Stage 1: distilled question + classification + challenges

  • Stage 2: evidence table (RFP vs BAFO, Dutch quote + English interpretation) + gaps

  • Stage 3: assumptions, related topics, Oracle platform notes

  • Stage 5: verdict + answer + documentation form + key artefact IDs

Typical duration: 10–20 seconds. Cost: ~€0.05 per call.

Step 8 — Storage (1 DB call)

Result stored in qa_entries table: question, passages JSONB, analysis JSONB,
extra_terms JSONB, status=DRAFT. Redirects to the detail page.

What	Count	Notes
DB calls — main RRF search	1	Always — full corpus, top 15
DB calls — AR supplement (B-03)	1	Always — top 5 most relevant architectural requirements
DB calls — code lookups	0 – N	One per auto-detected or explicit code
DB calls — free-text extra searches	0 – M	One per extra term (RRF, top 5)
DB calls — cross-ref resolution	≤ 2 × P	P = unique passages; cached per key
DB calls — store result	1	INSERT into qa_entries
Claude API calls	1	Always exactly one, never streaming
Embedding calls	1 + M	fastembed, runs in-process (no network)

Why one Claude call? All retrieval (search, lookup, cross-ref resolution) completes first. Claude receives the full evidence cluster in a single prompt and performs the entire staged analysis in one response. This keeps costs predictable (€0.04–0.06 per question), avoids iterative API round-trips, and makes the process auditable — the exact passages Claude saw are stored alongside the analysis.

Each call is stateless — from scratch

Claude receives no conversation history and no previous Q&A entries. Every submission starts fresh with only the current question and the passages retrieved for it.

What Claude sees	What Claude does NOT see
✓ FOMO project context (system prompt) ✓ Current question ✓ Retrieved passages + cross-refs ✓ Auto-detected code rows (AR, T, BR…)	✗ Previous Q&A entries and their verdicts ✗ Other questions asked today or before ✗ The analyst's comments or status updates ✗ Any conversation history

Consequence: if you ask a follow-up question related to an earlier entry (e.g. "given ARC-002's conclusion, does AR 069 change the VLABEL scope picture?"), Claude will not know about ARC-002 unless you paste the relevant conclusion into the question or the extra terms field. This is intentional — each analysis is independent and unbiased by prior conclusions. Cross-referencing between Q&A entries is currently a manual step.

This is a deliberate design principle, not a limitation. Including previous Q&A analyses in the prompt risks anchoring Claude on earlier conclusions — including wrong ones — and propagating hallucinations across related questions. Each verdict must be grounded solely in the indexed corpus evidence, not in prior outputs. Cross-referencing between entries is the analyst's responsibility, not the model's.

Corpus exclusion rules

Two mechanisms keep noise out of the search index:

EXCLUDED_PREFIXES in ingest.py — file paths or folder prefixes that are skipped entirely at ingest time. Used for commercial/pricing docs, HR/roles docs, and procedural documents.
SKIP_SHEETS in ingest.py — XLSX sheet names skipped across all files. Currently: Rollen, Instructies.

To add an exclusion: update ingest.py, run a targeted DELETE in psql, then deploy.sh db.