Reasoning & Search Service¶
The Reasoning & Search Service is the primary interface for natural language interaction with the Substrate knowledge graph. It answers any question about the architecture — structural, semantic, historical, or institutional — by retrieving evidence from the Unified Multimodal Knowledge Base (UMKB) through a layered pipeline called Hardened GraphRAG.
Responsibility¶
Translate natural language queries into precise graph traversals and vector searches, synthesise multi-source evidence into grounded answers with confidence scores, and expose the full reasoning capability of the UMKB through a REST and WebSocket API. The Reasoning Service is the query engine that makes the graph legible to humans and to other services.
Why "Hardened" GraphRAG¶
Baseline Microsoft GraphRAG as published has four well-documented production failure modes. Substrate's architecture addresses each one explicitly.
GraphRAG Failure Modes and Mitigations¶
| Failure Mode | Evidence | Substrate Mitigation |
|---|---|---|
| Hallucinated entities baked permanently into graph | AGRAG paper: LLM entity extraction fails to conform to expected formats; fabricated entities have no correction mechanism | Confidence scoring on all extracted entities; verification queue routes low-confidence items to human review before graph write |
| No temporal reasoning | GraphRAG treats a 2019 ADR as equally current as a 2026 one; no delta-analysis across time | Timestamped graph snapshots in PostgreSQL; temporal Cypher queries with AT(timestamp) semantics; staleness weights applied to retrieval scoring |
| 73–84% of errors are reasoning failures not retrieval failures | KET-RAG study: gold answer present in retrieved context but final answer still wrong in majority of cases | SPARQL chain-of-thought structured prompting; context compression via BFS graph-walk; MoE Scout for multi-hop reasoning over community summaries |
| Incremental updates require full re-run | GitHub Issue #741 (1,200+ upvotes) | PR-scoped incremental delta: only changed subgraph re-embedded and re-summarised; Leiden re-run on affected community only |
The consequence of these mitigations is that Substrate's graph-backed answers are grounded in verified, timestamped evidence and supported by confidence scores that the caller can inspect. Answers are not generated from a frozen snapshot of the graph at index time.
Layered Retrieval Pipeline¶
Every query passes through a routing layer that selects one or more retrieval strategies based on query type, then combines their outputs via Hybrid RRF Fusion before injecting the final context into the generation model.
Strategy Table¶
| Strategy | What It Does | Best Suited For | Model |
|---|---|---|---|
| HyDE (Hypothetical Document Embeddings) | Generates a hypothetical answer document and embeds it instead of the raw query; bridges the gap between terse queries and verbose graph content | Short or vague NL queries where raw query similarity is below retrieval threshold | Dense 70B |
| RAPTOR Tree Retrieval | Recursive abstract summaries at multiple levels: nodes → module clusters → domain summaries → system overview; +20pp accuracy over flat community summaries | Multi-hop questions: "what are our top architectural risks?" | MoE Scout (Llama 4) |
| Local GraphRAG (entity traversal) | Seeds on query-matched entities; traverses CALLS/DEPENDS_ON edges 1–3 hops; prunes by edge weight | Dependency questions: "what does PaymentService depend on?" | Dense cypher-lora |
| Global GraphRAG (community map-reduce) | Map-reduce over pre-computed Leiden community summaries; reduces to answer with confidence scoring | Strategic questions: "how is the data platform structured?" | MoE Scout (Llama 4) |
| Hybrid RRF Fusion | RRF score = Σ(1/(k + rank_i)) with k=60 across vector, graph, and keyword candidates; +8% factual correctness over any single strategy | All queries — final ranking step before LLM context injection | bge-reranker-v2-m3 |
HyDE (RSN-03)¶
When the raw query's cosine similarity against the pgvector HNSW index falls below a configured threshold (default 0.55), HyDE activates. The Dense 70B model generates a hypothetical answer document — a paragraph that describes what the answer to the query might look like. This hypothetical document is then embedded via bge-m3 and used as the vector search seed instead of the raw query. The resulting retrieved candidates are typically richer and more relevant because they share the vocabulary and structure of the actual graph content.
RAPTOR Tree Retrieval (RSN-05)¶
RAPTOR builds a hierarchical summary tree over the graph content offline (rebuilt nightly and on incremental delta events). The tree has four levels:
- Node level: Individual function, service, and infrastructure node summaries
- Module cluster level: Abstract summaries of tightly connected subgraphs within a service
- Domain summary level: Cross-service summaries for each architectural domain (data, auth, payments, etc.)
- System overview level: A single top-level summary of the entire system's structure and key tensions
At query time, RAPTOR walks this tree top-down, retrieving the most relevant branch at each level and expanding only into branches with high relevance scores. This delivers +20 percentage points in accuracy over flat community summaries for multi-hop questions because it reasons at the correct level of abstraction rather than drowning the model in irrelevant leaf-level detail.
The MoE Scout (Llama 4 Scout on port 8000) is used for RAPTOR synthesis because its mixture-of-experts architecture handles the multi-hop reasoning required to synthesise across summary levels.
Local GraphRAG — Entity Traversal (RSN-04)¶
For dependency and ownership questions, the retrieval pipeline seeds on the entities matched by the NL→Cypher step and performs a BFS traversal of the Observed Graph outward along CALLS, DEPENDS_ON, IMPORTS, OWNS, and HOSTS edges. The traversal depth is 1–3 hops; edges are pruned by weight (low-weight edges representing transitive or inferred relationships are excluded unless the query explicitly asks about them).
The resulting subgraph is serialised into a structured context block and injected directly into the generation prompt, giving the model precise, graph-grounded evidence rather than retrieved prose.
Global GraphRAG — Community Map-Reduce (RSN-14)¶
For corpus-wide strategic questions, the pipeline uses pre-computed Leiden community detection summaries stored in PostgreSQL. Each community summary is a paragraph describing the purpose, key services, and dominant relationships within a detected cluster of tightly connected graph nodes.
At query time, the model reads all community summaries (map step), scores their relevance to the query, then reduces the top-ranked summaries into a final answer (reduce step). The MoE Scout handles both the map scoring and the final reduce synthesis.
Community summaries are rebuilt incrementally: when an ingestion event changes a subgraph, only the Leiden communities that contain changed nodes are re-run and their summaries regenerated (RSN-15). This avoids a full corpus rebuild on every PR.
Hybrid RRF Fusion (RSN-06)¶
After all retrieval strategies have been invoked in parallel, their ranked candidate lists are merged using Reciprocal Rank Fusion:
RRF_score(d) = Σ 1 / (k + rank_i(d))
i
where k = 60 and rank_i(d) is the rank of document d in strategy i's result list. Candidates from vector search, graph traversal, and keyword (BM25) search are all eligible inputs.
The merged and ranked list is then passed to bge-reranker-v2-m3 (DGX Spark port 8004) for a final cross-encoder reranking pass before the top-k candidates are assembled into the LLM context. This two-stage approach — RRF merge followed by neural reranking — delivers +8% factual correctness over any single retrieval strategy.
Natural Language to Cypher (RSN-01, RSN-02)¶
The NL→Cypher pipeline translates user questions into valid Neo4j Cypher queries using the Dense cypher-lora adapter (Dense 70B + LoRA fine-tuned on Cypher generation from the Substrate schema). Target latency is under 1 second end-to-end including execution.
The translation is schema-aware: the cypher-lora adapter is trained on the Substrate node and edge taxonomy and produces queries that reference actual node labels and relationship types in the graph. Generated Cypher is executed against Neo4j in read-only mode; write operations are never exposed through the Reasoning Service API.
If the generated Cypher is syntactically invalid, the service falls back to the vector retrieval path and annotates the response with a low-confidence flag.
Temporal Graph Queries (RSN-07)¶
The Reasoning Service supports "point-in-time" queries: "What was the state of the graph on March 10th?" Substrate maintains timestamped graph snapshots in PostgreSQL. Each snapshot records the full set of node and edge states at a given point in time. Temporal Cypher queries use AT(timestamp) semantics to restrict traversal to the snapshot valid at the requested date.
This capability is critical for post-incident analysis: "What changed in the three days before last Friday's outage?" triggers a temporal diff between two snapshots, returning the delta as a structured list of added, removed, and changed nodes and edges.
Institutional Memory Retrieval (RSN-08)¶
The question "Why was this decision made?" is answered by the institutional memory retrieval pipeline. The pipeline queries:
DecisionNodeentries withWHYedges to the service in questionFailurePatternnodes withCAUSED_BYorAFFECTEDedges touching the same serviceExceptionNodeentries on policies that have been excepted for this serviceMemoryNodeentries extracted from PR review comments, Slack threads, and post-mortems
All retrieved memory nodes are ranked by recency and relevance (using bge-reranker-v2-m3) before being assembled into the response. Each item carries a provenance link: the source document, author, and date.
Intent Mismatch Detection (RSN-09)¶
On each PR open event, the Reasoning Service computes the cosine similarity between:
- The bge-m3 embedding of the PR's code changes (summarised by the Ingestion Service delta)
- The bge-m3 embedding of the linked GitHub Projects v2 item description
If the similarity falls below 0.6, an intent mismatch is flagged. The flag is included in the Governance Service's PR comment as a soft advisory: "The code changes in this PR appear semantically distant from the linked project item. Confirm that this PR addresses the correct work item."
This catches common drift scenarios: a developer who branches off the wrong ticket, a PR that grew beyond its original scope, or a project item whose description was updated without updating the linked PR.
Service Ownership Queries (RSN-10)¶
"Who owns this service?" is answered by combining two sources:
OWNSedges in the Observed Graph, connecting service nodes toDeveloperandTeamnodes- The parsed CODEOWNERS file, ingested by the GitHub connector as a set of path-to-owner mappings
The Reasoning Service merges these two sources and returns the owning team or developer(s) with their contact handles. If the two sources disagree, both are returned with a note indicating the discrepancy.
Confidence Scoring (RSN-11)¶
Every response from the Reasoning Service carries a structured confidence block:
{
"confidence": 0.84,
"evidence_nodes": ["ServiceNode:PaymentService", "DecisionNode:ADR-047"],
"source_density": 3,
"retrieval_strategies_used": ["local_graphrag", "hyDE", "rrf_fusion"],
"low_confidence_warning": false
}
source_density is the count of independent source documents (graph nodes from different connectors) that support the answer. A response with source_density >= 3 is considered well-supported. Responses with source_density = 1 carry a visible low-confidence warning.
Caching (RSN-13)¶
Frequent subgraph queries are cached in Redis with a TTL of 15 minutes. The cache key is a hash of the Cypher query or the structured retrieval parameters. On each graph update event published to NATS, the Reasoning Service evaluates which cached queries involve nodes in the updated subgraph and invalidates their cache entries. This ensures that cached answers do not serve stale data after an ingestion event modifies the graph.
API¶
The Reasoning Service exposes:
POST /search— NL query; returns ranked answer with confidence and evidence nodesPOST /cypher— Direct Cypher query; returns typed subgraph resultGET /ownership/{service_id}— Returns ownership chain for a serviceGET /dependencies/{service_id}— Returns dependency tree for a serviceGET /memory/{service_id}— Returns institutional memory for a servicePOST /intent-check— PR number input; returns intent mismatch scoreGET /temporal— Point-in-time graph state queryWS /stream— WebSocket connection for streaming reasoning responses
All endpoints return JSON. The WebSocket endpoint streams tokens for long-running RAPTOR and Global GraphRAG responses (RSN-12).
Use Cases with Latency Targets¶
| ID | Query | Strategy | Target Latency |
|---|---|---|---|
| RSN-UC-01 | "What does PaymentService depend on?" | Local GraphRAG | < 1s |
| RSN-UC-02 | "Who owns the checkout service?" | OWNS edge traversal + CODEOWNERS | < 500ms |
| RSN-UC-03 | PR intent mismatch: code vs project item | Hybrid embedding similarity | < 1s |
| RSN-UC-04 | "What are our top 3 architectural risks?" | Global GraphRAG (RAPTOR + community) | < 8s |
| RSN-UC-05 | "Why was this architectural decision made?" | Memory retrieval: DecisionNode + FailurePattern | < 5s |
| RSN-UC-06 | "What changed before last Friday's incident?" | Temporal graph snapshot diff + HyDE | < 5s |
| RSN-UC-07 | "Find all services calling auth directly" | Cypher translation + graph traversal | < 1s |
| RSN-UC-08 | "Which services does this epic affect?" | Project item → IntentAssertion → service graph | < 5s |
Functional Requirements¶
| ID | Requirement | Priority |
|---|---|---|
| RSN-01 | Natural language → Cypher query translation via Dense cypher-lora; latency < 1 second | Must Have |
| RSN-02 | Execute Cypher against Neo4j (read-only); return typed subgraph result | Must Have |
| RSN-03 | HyDE query expansion: generate hypothetical answer, embed it, use for vector retrieval when raw query similarity below threshold | Must Have |
| RSN-04 | Local GraphRAG: entity traversal for dependency and ownership questions | Must Have |
| RSN-05 | RAPTOR tree retrieval: hierarchical summaries from node level through domain level | Must Have |
| RSN-06 | Hybrid RRF fusion: combine vector, graph, keyword candidates; rerank via bge-reranker | Must Have |
| RSN-07 | Temporal graph query: "what was the state of the graph on date X?" via timestamped PostgreSQL snapshots | Must Have |
| RSN-08 | Institutional memory retrieval: "why was this decision made?" returns DecisionNode + FailurePattern + ExceptionNode with provenance | Must Have |
| RSN-09 | Intent mismatch detection: embed PR code changes and linked project item; compute cosine similarity; flag if below 0.6 | Must Have |
| RSN-10 | Service ownership query: "who owns this?" via OWNS edges + CODEOWNERS file parse | Must Have |
| RSN-11 | Confidence scoring on every LLM response: show evidence nodes and source density that support the answer | Must Have |
| RSN-12 | REST API + WebSocket streaming for all search and reasoning endpoints | Must Have |
| RSN-13 | Cache frequent subgraph queries in Redis; invalidate on graph update events affecting the cached subgraph | Must Have |
| RSN-14 | Global GraphRAG: map-reduce over Leiden community summaries for corpus-wide strategic questions | Must Have |
| RSN-15 | Community summary rebuild: Leiden re-run scoped to affected communities only on each graph update | Must Have |
Infrastructure Dependencies¶
| Component | Role in Reasoning & Search Service |
|---|---|
| Neo4j 5.x | Graph traversal for Local GraphRAG, entity-seeded BFS, Cypher execution |
| PostgreSQL 16 + pgvector | HNSW vector index for embedding retrieval; timestamped snapshots for temporal queries; community summary store |
| Redis 7 | Query result cache with subgraph-aware invalidation |
| NATS JetStream | Receives graph update events to trigger cache invalidation and community summary rebuilds |
| DGX Spark port 8000 | Llama 4 Scout (MoE): RAPTOR synthesis, Global GraphRAG map-reduce |
| DGX Spark port 8001 | Dense 70B + cypher-lora: NL→Cypher; explain-lora: answer generation; HyDE document generation |
| DGX Spark port 8003 | bge-m3: query embedding, HyDE document embedding, intent mismatch embeddings |
| DGX Spark port 8004 | bge-reranker-v2-m3: RRF fusion reranking, memory retrieval reranking |