Reasoning & Search Service¶

The Reasoning & Search Service is the primary interface for natural language interaction with the Substrate knowledge graph. It answers any question about the architecture — structural, semantic, historical, or institutional — by retrieving evidence from the Unified Multimodal Knowledge Base (UMKB) through a layered pipeline called Hardened GraphRAG.

Responsibility¶

Translate natural language queries into precise graph traversals and vector searches, synthesise multi-source evidence into grounded answers with confidence scores, and expose the full reasoning capability of the UMKB through a REST and WebSocket API. The Reasoning Service is the query engine that makes the graph legible to humans and to other services.

Why "Hardened" GraphRAG¶

Baseline Microsoft GraphRAG as published has four well-documented production failure modes. Substrate's architecture addresses each one explicitly.

GraphRAG Failure Modes and Mitigations¶

Failure Mode	Evidence	Substrate Mitigation
Hallucinated entities baked permanently into graph	AGRAG paper: LLM entity extraction fails to conform to expected formats; fabricated entities have no correction mechanism	Confidence scoring on all extracted entities; verification queue routes low-confidence items to human review before graph write
No temporal reasoning	GraphRAG treats a 2019 ADR as equally current as a 2026 one; no delta-analysis across time	Timestamped graph snapshots in PostgreSQL; temporal Cypher queries with AT(timestamp) semantics; staleness weights applied to retrieval scoring
73–84% of errors are reasoning failures not retrieval failures	KET-RAG study: gold answer present in retrieved context but final answer still wrong in majority of cases	SPARQL chain-of-thought structured prompting; context compression via BFS graph-walk; MoE Scout for multi-hop reasoning over community summaries
Incremental updates require full re-run	GitHub Issue #741 (1,200+ upvotes)	PR-scoped incremental delta: only changed subgraph re-embedded and re-summarised; Leiden re-run on affected community only

The consequence of these mitigations is that Substrate's graph-backed answers are grounded in verified, timestamped evidence and supported by confidence scores that the caller can inspect. Answers are not generated from a frozen snapshot of the graph at index time.

Layered Retrieval Pipeline¶

Every query passes through a routing layer that selects one or more retrieval strategies based on query type, then combines their outputs via Hybrid RRF Fusion before injecting the final context into the generation model.

Strategy Table¶

Strategy	What It Does	Best Suited For	Model
HyDE (Hypothetical Document Embeddings)	Generates a hypothetical answer document and embeds it instead of the raw query; bridges the gap between terse queries and verbose graph content	Short or vague NL queries where raw query similarity is below retrieval threshold	Dense 70B
RAPTOR Tree Retrieval	Recursive abstract summaries at multiple levels: nodes → module clusters → domain summaries → system overview; +20pp accuracy over flat community summaries	Multi-hop questions: "what are our top architectural risks?"	MoE Scout (Llama 4)
Local GraphRAG (entity traversal)	Seeds on query-matched entities; traverses CALLS/DEPENDS_ON edges 1–3 hops; prunes by edge weight	Dependency questions: "what does PaymentService depend on?"	Dense cypher-lora
Global GraphRAG (community map-reduce)	Map-reduce over pre-computed Leiden community summaries; reduces to answer with confidence scoring	Strategic questions: "how is the data platform structured?"	MoE Scout (Llama 4)
Hybrid RRF Fusion	RRF score = Σ(1/(k + rank_i)) with k=60 across vector, graph, and keyword candidates; +8% factual correctness over any single strategy	All queries — final ranking step before LLM context injection	bge-reranker-v2-m3

HyDE (RSN-03)¶

When the raw query's cosine similarity against the pgvector HNSW index falls below a configured threshold (default 0.55), HyDE activates. The Dense 70B model generates a hypothetical answer document — a paragraph that describes what the answer to the query might look like. This hypothetical document is then embedded via bge-m3 and used as the vector search seed instead of the raw query. The resulting retrieved candidates are typically richer and more relevant because they share the vocabulary and structure of the actual graph content.

RAPTOR Tree Retrieval (RSN-05)¶

RAPTOR builds a hierarchical summary tree over the graph content offline (rebuilt nightly and on incremental delta events). The tree has four levels:

Node level: Individual function, service, and infrastructure node summaries
Module cluster level: Abstract summaries of tightly connected subgraphs within a service
Domain summary level: Cross-service summaries for each architectural domain (data, auth, payments, etc.)
System overview level: A single top-level summary of the entire system's structure and key tensions

At query time, RAPTOR walks this tree top-down, retrieving the most relevant branch at each level and expanding only into branches with high relevance scores. This delivers +20 percentage points in accuracy over flat community summaries for multi-hop questions because it reasons at the correct level of abstraction rather than drowning the model in irrelevant leaf-level detail.

The MoE Scout (Llama 4 Scout on port 8000) is used for RAPTOR synthesis because its mixture-of-experts architecture handles the multi-hop reasoning required to synthesise across summary levels.

Local GraphRAG — Entity Traversal (RSN-04)¶

For dependency and ownership questions, the retrieval pipeline seeds on the entities matched by the NL→Cypher step and performs a BFS traversal of the Observed Graph outward along CALLS, DEPENDS_ON, IMPORTS, OWNS, and HOSTS edges. The traversal depth is 1–3 hops; edges are pruned by weight (low-weight edges representing transitive or inferred relationships are excluded unless the query explicitly asks about them).

The resulting subgraph is serialised into a structured context block and injected directly into the generation prompt, giving the model precise, graph-grounded evidence rather than retrieved prose.

Global GraphRAG — Community Map-Reduce (RSN-14)¶

For corpus-wide strategic questions, the pipeline uses pre-computed Leiden community detection summaries stored in PostgreSQL. Each community summary is a paragraph describing the purpose, key services, and dominant relationships within a detected cluster of tightly connected graph nodes.

At query time, the model reads all community summaries (map step), scores their relevance to the query, then reduces the top-ranked summaries into a final answer (reduce step). The MoE Scout handles both the map scoring and the final reduce synthesis.

Community summaries are rebuilt incrementally: when an ingestion event changes a subgraph, only the Leiden communities that contain changed nodes are re-run and their summaries regenerated (RSN-15). This avoids a full corpus rebuild on every PR.

Hybrid RRF Fusion (RSN-06)¶

After all retrieval strategies have been invoked in parallel, their ranked candidate lists are merged using Reciprocal Rank Fusion:

RRF_score(d) = Σ  1 / (k + rank_i(d))
               i

where k = 60 and rank_i(d) is the rank of document d in strategy i's result list. Candidates from vector search, graph traversal, and keyword (BM25) search are all eligible inputs.

The merged and ranked list is then passed to bge-reranker-v2-m3 (DGX Spark port 8004) for a final cross-encoder reranking pass before the top-k candidates are assembled into the LLM context. This two-stage approach — RRF merge followed by neural reranking — delivers +8% factual correctness over any single retrieval strategy.

Natural Language to Cypher (RSN-01, RSN-02)¶

The NL→Cypher pipeline translates user questions into valid Neo4j Cypher queries using the Dense cypher-lora adapter (Dense 70B + LoRA fine-tuned on Cypher generation from the Substrate schema). Target latency is under 1 second end-to-end including execution.

The translation is schema-aware: the cypher-lora adapter is trained on the Substrate node and edge taxonomy and produces queries that reference actual node labels and relationship types in the graph. Generated Cypher is executed against Neo4j in read-only mode; write operations are never exposed through the Reasoning Service API.

If the generated Cypher is syntactically invalid, the service falls back to the vector retrieval path and annotates the response with a low-confidence flag.

Temporal Graph Queries (RSN-07)¶

The Reasoning Service supports "point-in-time" queries: "What was the state of the graph on March 10th?" Substrate maintains timestamped graph snapshots in PostgreSQL. Each snapshot records the full set of node and edge states at a given point in time. Temporal Cypher queries use AT(timestamp) semantics to restrict traversal to the snapshot valid at the requested date.

This capability is critical for post-incident analysis: "What changed in the three days before last Friday's outage?" triggers a temporal diff between two snapshots, returning the delta as a structured list of added, removed, and changed nodes and edges.

Institutional Memory Retrieval (RSN-08)¶

The question "Why was this decision made?" is answered by the institutional memory retrieval pipeline. The pipeline queries:

DecisionNode entries with WHY edges to the service in question
FailurePattern nodes with CAUSED_BY or AFFECTED edges touching the same service
ExceptionNode entries on policies that have been excepted for this service
MemoryNode entries extracted from PR review comments, Slack threads, and post-mortems

All retrieved memory nodes are ranked by recency and relevance (using bge-reranker-v2-m3) before being assembled into the response. Each item carries a provenance link: the source document, author, and date.

Intent Mismatch Detection (RSN-09)¶

On each PR open event, the Reasoning Service computes the cosine similarity between:

The bge-m3 embedding of the PR's code changes (summarised by the Ingestion Service delta)
The bge-m3 embedding of the linked GitHub Projects v2 item description

If the similarity falls below 0.6, an intent mismatch is flagged. The flag is included in the Governance Service's PR comment as a soft advisory: "The code changes in this PR appear semantically distant from the linked project item. Confirm that this PR addresses the correct work item."

This catches common drift scenarios: a developer who branches off the wrong ticket, a PR that grew beyond its original scope, or a project item whose description was updated without updating the linked PR.

Service Ownership Queries (RSN-10)¶

"Who owns this service?" is answered by combining two sources:

OWNS edges in the Observed Graph, connecting service nodes to Developer and Team nodes
The parsed CODEOWNERS file, ingested by the GitHub connector as a set of path-to-owner mappings

The Reasoning Service merges these two sources and returns the owning team or developer(s) with their contact handles. If the two sources disagree, both are returned with a note indicating the discrepancy.

Confidence Scoring (RSN-11)¶

Every response from the Reasoning Service carries a structured confidence block:

{
  "confidence": 0.84,
  "evidence_nodes": ["ServiceNode:PaymentService", "DecisionNode:ADR-047"],
  "source_density": 3,
  "retrieval_strategies_used": ["local_graphrag", "hyDE", "rrf_fusion"],
  "low_confidence_warning": false
}

source_density is the count of independent source documents (graph nodes from different connectors) that support the answer. A response with source_density >= 3 is considered well-supported. Responses with source_density = 1 carry a visible low-confidence warning.

Caching (RSN-13)¶

Frequent subgraph queries are cached in Redis with a TTL of 15 minutes. The cache key is a hash of the Cypher query or the structured retrieval parameters. On each graph update event published to NATS, the Reasoning Service evaluates which cached queries involve nodes in the updated subgraph and invalidates their cache entries. This ensures that cached answers do not serve stale data after an ingestion event modifies the graph.

API¶

The Reasoning Service exposes:

POST /search — NL query; returns ranked answer with confidence and evidence nodes
POST /cypher — Direct Cypher query; returns typed subgraph result
GET /ownership/{service_id} — Returns ownership chain for a service
GET /dependencies/{service_id} — Returns dependency tree for a service
GET /memory/{service_id} — Returns institutional memory for a service
POST /intent-check — PR number input; returns intent mismatch score
GET /temporal — Point-in-time graph state query
WS /stream — WebSocket connection for streaming reasoning responses

All endpoints return JSON. The WebSocket endpoint streams tokens for long-running RAPTOR and Global GraphRAG responses (RSN-12).

Use Cases with Latency Targets¶

ID	Query	Strategy	Target Latency
RSN-UC-01	"What does PaymentService depend on?"	Local GraphRAG	< 1s
RSN-UC-02	"Who owns the checkout service?"	OWNS edge traversal + CODEOWNERS	< 500ms
RSN-UC-03	PR intent mismatch: code vs project item	Hybrid embedding similarity	< 1s
RSN-UC-04	"What are our top 3 architectural risks?"	Global GraphRAG (RAPTOR + community)	< 8s
RSN-UC-05	"Why was this architectural decision made?"	Memory retrieval: DecisionNode + FailurePattern	< 5s
RSN-UC-06	"What changed before last Friday's incident?"	Temporal graph snapshot diff + HyDE	< 5s
RSN-UC-07	"Find all services calling auth directly"	Cypher translation + graph traversal	< 1s
RSN-UC-08	"Which services does this epic affect?"	Project item → IntentAssertion → service graph	< 5s

Functional Requirements¶

ID	Requirement	Priority
RSN-01	Natural language → Cypher query translation via Dense cypher-lora; latency < 1 second	Must Have
RSN-02	Execute Cypher against Neo4j (read-only); return typed subgraph result	Must Have
RSN-03	HyDE query expansion: generate hypothetical answer, embed it, use for vector retrieval when raw query similarity below threshold	Must Have
RSN-04	Local GraphRAG: entity traversal for dependency and ownership questions	Must Have
RSN-05	RAPTOR tree retrieval: hierarchical summaries from node level through domain level	Must Have
RSN-06	Hybrid RRF fusion: combine vector, graph, keyword candidates; rerank via bge-reranker	Must Have
RSN-07	Temporal graph query: "what was the state of the graph on date X?" via timestamped PostgreSQL snapshots	Must Have
RSN-08	Institutional memory retrieval: "why was this decision made?" returns DecisionNode + FailurePattern + ExceptionNode with provenance	Must Have
RSN-09	Intent mismatch detection: embed PR code changes and linked project item; compute cosine similarity; flag if below 0.6	Must Have
RSN-10	Service ownership query: "who owns this?" via OWNS edges + CODEOWNERS file parse	Must Have
RSN-11	Confidence scoring on every LLM response: show evidence nodes and source density that support the answer	Must Have
RSN-12	REST API + WebSocket streaming for all search and reasoning endpoints	Must Have
RSN-13	Cache frequent subgraph queries in Redis; invalidate on graph update events affecting the cached subgraph	Must Have
RSN-14	Global GraphRAG: map-reduce over Leiden community summaries for corpus-wide strategic questions	Must Have
RSN-15	Community summary rebuild: Leiden re-run scoped to affected communities only on each graph update	Must Have

Infrastructure Dependencies¶

Component	Role in Reasoning & Search Service
Neo4j 5.x	Graph traversal for Local GraphRAG, entity-seeded BFS, Cypher execution
PostgreSQL 16 + pgvector	HNSW vector index for embedding retrieval; timestamped snapshots for temporal queries; community summary store
Redis 7	Query result cache with subgraph-aware invalidation
NATS JetStream	Receives graph update events to trigger cache invalidation and community summary rebuilds
DGX Spark port 8000	Llama 4 Scout (MoE): RAPTOR synthesis, Global GraphRAG map-reduce
DGX Spark port 8001	Dense 70B + cypher-lora: NL→Cypher; explain-lora: answer generation; HyDE document generation
DGX Spark port 8003	bge-m3: query embedding, HyDE document embedding, intent mismatch embeddings
DGX Spark port 8004	bge-reranker-v2-m3: RRF fusion reranking, memory retrieval reranking