Simulation Service¶
The Simulation Service provides a sandboxed what-if environment that lets developers, architects, and SREs explore the architectural consequences of proposed changes before writing a single line of code. It is purely advisory and never touches the production knowledge base.
Responsibility¶
Accept a proposed mutation — expressed either as a structured JSON specification or as natural language — clone the current Observed and Intended graphs into an ephemeral Neo4j sandbox, apply the mutation atomically, run the full active OPA policy suite against the sandbox, compute blast radius and PageRank deltas, and return a structured before/after comparison with a plain English summary and relevant institutional memory context.
Why Simulation Is Its Own Service¶
The Simulation Service is architecturally separate from the Governance Service for fundamental reasons that go beyond code organisation:
| Dimension | Governance Service | Simulation Service |
|---|---|---|
| Graph it operates on | Real, live production graph | Hypothetical, ephemeral sandbox graph |
| Timing | Blocking and synchronous — must complete before CI gate | Advisory and asynchronous — developer can proceed while simulation runs |
| Effect on knowledge base | Acts on the production UMKB | Never touches the production knowledge base |
| Trigger | Fires on actual changes that exist in a PR | Fires on proposed/hypothetical changes not yet written |
| Audience | CI pipeline and compliance record | Developer, architect, or PM exploring options |
The Governance Service answers: "Does what you wrote comply with our policies?" The Simulation Service answers: "If you were to make this change, what would break, what would improve, and what should you know before you start?"
Mutation Input Formats¶
The Simulation Service accepts two input formats for describing a proposed change:
Structured JSON Specification (SIM-01)¶
A mutation spec is a JSON document describing one or more graph operations:
{
"mutations": [
{
"op": "add_node",
"type": "ServiceNode",
"properties": { "name": "RecommendationService", "language": "python" }
},
{
"op": "add_edge",
"type": "CALLS",
"from": "ServiceNode:OrderService",
"to": "ServiceNode:RecommendationService"
},
{
"op": "remove_edge",
"type": "DEPENDS_ON",
"from": "ServiceNode:OrderService",
"to": "LibraryNode:legacy-xml-parser"
}
]
}
Supported operations: add_node, remove_node, add_edge, remove_edge, update_node_properties, split_node, merge_nodes.
Natural Language Description (SIM-02)¶
A developer can describe the change in plain English: "What happens if I split OrderService into separate OrderCreation and OrderFulfillment services?"
The MoE Scout (Llama 4 Scout, port 8000) translates the natural language description into a structured mutation spec. Before the simulation runs, the generated spec is returned to the requester for confirmation:
Interpreted mutation:
1. Add node: ServiceNode "OrderCreationService"
2. Add node: ServiceNode "OrderFulfillmentService"
3. Move edges: CALLS edges from OrderService to OrderCreationService (3 found)
4. Move edges: CALLS edges from OrderService to OrderFulfillmentService (2 found)
5. Remove node: ServiceNode "OrderService"
Confirm? [Yes / Edit spec / Cancel]
This human confirmation step prevents the simulation from running against a misinterpreted mutation. No sandbox is created until the spec is confirmed.
Simulation Execution Pipeline¶
Step 1: Sandbox Creation (SIM-03)¶
An ephemeral Neo4j named graph is created using:
CREATE DATABASE sim_<uuid> IF NOT EXISTS
The Observed Graph and Intended Graph are cloned into this named database using Neo4j's database copy mechanism. The clone is fully isolated — no read or write operations on the sandbox can affect the production neo4j database.
Step 2: Mutation Application (SIM-04)¶
The mutation spec is applied to the sandbox graph as a single atomic transaction. If any operation in the spec fails (e.g., a remove_node on a node that does not exist), the entire mutation is rolled back and the error is returned to the caller with a diagnostic message.
Step 3: Policy Evaluation (SIM-05)¶
The full active OPA policy suite is run against the sandbox graph using the same evaluation pathway as the Governance Service, but against the sandbox Neo4j database instead of production. This produces:
- A list of policies newly violated by the mutation (were passing before, now failing)
- A list of policies newly satisfied by the mutation (were failing before, now passing)
- A list of policies unchanged — still failing or still passing after the mutation
Step 4: Blast Radius Delta Computation (SIM-06)¶
The blast radius is computed twice — once against the pre-mutation sandbox state and once against the post-mutation state — using the same Neo4j reachability traversal and PageRank weighting as the Governance Service.
The delta is expressed as:
- Before: N nodes in blast radius, M critical nodes
- After: N' nodes in blast radius, M' critical nodes
- Delta: +/- X nodes affected, +/- Y critical nodes
Step 5: PageRank Impact (SIM-07)¶
The GDS PageRank algorithm is re-run on the sandbox post-mutation to detect any shifts in the criticality rankings of core services. A service that gains many new inbound CALLS edges may become a new architectural bottleneck. A split service may redistribute criticality across two nodes. These changes are included in the structured delta.
Step 6: Institutional Memory Retrieval (SIM-09)¶
The Reasoning Service is queried for ADRs and post-mortems relevant to the mutation context: the services being modified, added, or removed. Relevant memory is returned in the simulation output as "context you should know before making this change."
Step 7: Result Return and Sandbox Destruction (SIM-07, SIM-10)¶
The structured result is returned to the caller. Immediately after the result is returned, the ephemeral sandbox database is dropped:
DROP DATABASE sim_<uuid>
No sandbox persists beyond the response. Sandbox lifetime is bounded at 10 minutes; any sandbox not explicitly cleaned up by the normal flow is dropped by a scheduled cleanup job.
Simulation Output¶
Structured Delta (SIM-07)¶
{
"simulation_id": "sim_abc123",
"mutation_summary": "Split OrderService into OrderCreationService and OrderFulfillmentService",
"policies_newly_violated": [
{
"policy_id": "substrate/service-ownership",
"severity": "soft-mandatory",
"message": "OrderCreationService has no OWNS edge — assign ownership before deploying"
}
],
"policies_newly_satisfied": [
{
"policy_id": "substrate/solid-principles",
"message": "SRP: OrderCreationService efferent coupling = 3 (was 7 in OrderService)"
}
],
"unchanged_violations": [],
"blast_radius_delta": {
"before": { "total_nodes": 14, "critical_nodes": 2 },
"after": { "total_nodes": 11, "critical_nodes": 1 },
"delta_nodes": -3,
"delta_critical": -1
},
"pagerank_impact": [
{ "node": "AuthService", "before": 0.31, "after": 0.28, "change": -0.03 }
],
"memory_context": [
{
"type": "DecisionNode",
"id": "ADR-031",
"title": "Order domain split deferred in 2023",
"summary": "Split was deferred due to shared database — verify DB ownership is separated before proceeding"
}
]
}
Plain English Summary by Role (SIM-08)¶
The Dense explain-lora adapter generates a tailored plain English summary based on the requesting role:
| Output | Metric | Role |
|---|---|---|
| Blast Radius Delta | Number of downstream nodes affected | Architect: identifies high-impact changes early |
| Policy Delta | List of policies newly violated/satisfied | Developer: understands compliance before coding |
| Criticality Impact | Change in PageRank for core services | SRE: identifies new structural bottlenecks |
| Memory Context | Relevant past ADRs/post-mortems | PM: understands history of modified area |
Example developer summary: "Splitting OrderService into two services will resolve the SOLID SRP violation (coupling drops from 7 to 3) and reduce blast radius by 3 nodes. However, you will need to assign ownership for the two new services before deployment. Note: ADR-031 from 2023 flagged shared database as a blocker for this split — verify that is resolved."
Example architect summary: "This split reduces blast radius from 14 to 11 nodes and eliminates the highest coupling violation in the order domain. AuthService's PageRank drops slightly (0.31 → 0.28) which reduces the bottleneck risk. The primary risk is that two new services will need governance bootstrapping (ownership, documentation, test coverage)."
Result Persistence (SIM-11)¶
Simulation results are stored in PostgreSQL for 90 days and are queryable via the Simulation Service API. This enables:
- Sprint planning sessions where multiple scenarios are compared side by side
- Audit of what simulations were run before a major architectural change
- Training data for improving the NL→mutation spec translation
Multi-Step Simulation (SIM-12 — v1.1)¶
A multi-step simulation applies mutations sequentially: mutation A, then B, then C, evaluating policy state after each step. This enables migration roadmap planning: "If we first extract the database layer (step 1), then split the service (step 2), then introduce the gateway (step 3), at what point do all policies pass?"
Policy Impact Simulation (SIM-13 — v1.1)¶
A policy impact simulation runs a proposed new Rego policy against the current production graph — without modifying the graph — and returns the list of all services that would currently be in violation. This answers: "If we adopt this new policy today, what already breaks?" before the policy is activated in the Governance Service.
Use Cases¶
| ID | Scenario | Mutation Input | Key Output |
|---|---|---|---|
| SIM-UC-01 | "What happens if I split OrderService?" | NL: split node | Blast radius delta, newly violated/satisfied policies, ADR-031 context |
| SIM-UC-02 | "What breaks if I upgrade axios to 1.x?" | JSON: update dependency version | License policy check, dependency traversal impact |
| SIM-UC-03 | "Blast radius of removing the API gateway?" | JSON: remove node | Full reachability delta, mTLS policy implications |
| SIM-UC-04 | "If we add this policy, what currently breaks?" | Policy impact simulation | List of currently non-compliant services |
| SIM-UC-05 | Sprint planning — model proposed new services | JSON: add nodes + edges | Policy readiness before sprint starts |
Functional Requirements¶
| ID | Requirement | Priority |
|---|---|---|
| SIM-01 | Accept proposed mutation as structured JSON spec (add/remove node/edge, split node, merge nodes, update properties) | Must Have |
| SIM-02 | Accept natural language description and translate to mutation spec via MoE Scout; return spec for human confirmation before execution | Must Have |
| SIM-03 | Clone Observed + Intended graphs into ephemeral Neo4j named graph using CREATE DATABASE IF NOT EXISTS | Must Have |
| SIM-04 | Apply mutation spec to sandbox graph atomically | Must Have |
| SIM-05 | Run full active OPA policy suite against sandbox graph | Must Have |
| SIM-06 | Compute blast radius delta: before and after affected node count, with criticality weighting | Must Have |
| SIM-07 | Return structured delta: policies newly violated, newly satisfied, unchanged violations, blast radius delta, PageRank impact | Must Have |
| SIM-08 | Return plain English summary appropriate to requesting role (developer vs architect vs DevOps) | Must Have |
| SIM-09 | Surface relevant institutional memory: ADRs and post-mortems relevant to the proposed change context | Must Have |
| SIM-10 | Destroy ephemeral sandbox named graph after result returned; no sandbox persists | Must Have |
| SIM-11 | Simulation results stored in PostgreSQL for 90 days; queryable via API | Must Have |
| SIM-12 | Multi-step simulation: apply mutation A, then B, then evaluate — for migration roadmap planning | Nice to Have (v1.1) |
| SIM-13 | Policy impact simulation: "if we add this policy, what currently breaks?" without changing the graph | Nice to Have (v1.1) |
Infrastructure Dependencies¶
| Component | Role in Simulation Service |
|---|---|
| Neo4j 5.x | Sandbox named graph creation, graph cloning, mutation application, policy input subgraph export |
| PostgreSQL 16 | Simulation result persistence (90-day retention) |
| OPA Server | Policy evaluation against sandbox graph |
| DGX Spark port 8000 | Llama 4 Scout (MoE): NL→mutation spec translation |
| DGX Spark port 8001 | Dense 70B + explain-lora: plain English summary generation by role |
| Reasoning Service | Institutional memory retrieval for ADR and post-mortem context |
| Governance Service | PageRank and blast radius computation logic (shared) |