Development Milestones — Roadmap Reference¶

Development Philosophy¶

Substrate's development is governed by two principles. Every engineering decision is evaluated against both of them before work begins.

Principle 1: The Graph Is the Product¶

The React UI, OPA policy engine, and vLLM inference layer are delivery mechanisms. The value of Substrate lives in the Unified Multimodal Knowledge Base. A polished interface on top of an inaccurate or incomplete graph produces worse outcomes than no interface at all — it gives false confidence to decisions based on bad data.

This principle has a practical consequence: no feature work on visualization, query UX, or policy authoring begins until the underlying graph data for that feature is accurate, complete, and trustworthy. The graph is built and validated first; the surface is built second.

Every engineering decision asks: "Does this make the graph more accurate, more queryable, or more trusted?" If the answer is no, it is deferred.

Principle 2: Determinism Over Probability for Enforcement¶

OPA/Rego evaluates governance decisions. LLMs explain, extract, and reason. LLMs never make pass/fail enforcement decisions.

This is not a performance optimization — it is an audibility and defensibility requirement. A developer whose PR is blocked must be able to trace the block to an exact OPA policy clause and an exact graph condition. A policy that blocks a deployment must be reproducible: given the same graph state and the same policy pack, the result must always be identical.

LLMs are probabilistic. They can return different explanations for the same input on different runs. Using an LLM to make an enforcement decision means the decision is not reproducible, not traceable, and not auditable. This is legally and operationally unacceptable for a governance tool.

The division is explicit: - OPA/Rego: "Is this PR blocked?" — deterministic, reproducible, auditable - Dense explain-lora: "Why is this PR blocked, in plain English?" — probabilistic, informational, not enforcement

16-Phase Implementation Plan¶

This is the consolidated implementation plan for the Substrate platform. It replaces earlier fragmented plans with one execution order.

Phase 0: Architecture Freeze and Guardrails¶

Goal: Lock the rules of the system before changing more code. - Freeze canonical decisions: Keycloak as authority, backend owns authorization, frontend uses generated types. - Classify routes: frontend-session, automation, webhook, internal. - Add CI guardrails: OpenAPI drift check, generated type drift check.

Phase 1: Backend Core Cleanup for SOLID and DRY¶

Goal: Finish the backend and API-first cleanup. - Standardize repository helpers and shared response patterns. - Ensure module boundaries are explicit (auth, iam, marketplace, etc.). - Implement base repository and shared exceptions.

Phase 2: OpenAPI Split and Frontend Type Discipline¶

Goal: Separate browser and automation contracts cleanly. - Generate openapi.frontend.yml and openapi.automation.yml. - Ensure frontend TS types are generated from the frontend spec only. - Reserve automation endpoints for service accounts and CI clients.

Phase 3: Keycloak Auth Foundation¶

Goal: Move to a Keycloak-centric identity model. - Replace backend password grant with confidential service-account client. - Treat substrate-realm.json as the source of truth for clients, roles, and groups. - Remove app-local credential authority and deprecate X-API-Key fallback.

Goal: Make first login deterministic for all users. - Add onboarding endpoints for bootstrap and invite acceptance. - On first login: project Keycloak user to backend, create personal account org, assign roles.

Phase 5: User Directory, Org Creation, and Invites¶

Goal: Finish the collaborative identity model. - Add searchable user directory projection from Keycloak. - Implement org creation, invites, and team membership management.

Phase 6: Backend Authorization and Frontend Access Context¶

Goal: Move all authorization truth to backend-owned access context. - Define a route-by-route authorization matrix. - Replace frontend role derivation with a unified backend access-context response.

Phase 7: Runtime Config Unification¶

Goal: Make UI-persisted settings drive backend runtime consumers. - Classify config into deployment bootstrap vs. tenant runtime config. - Build runtime settings provider for connectors, policies, and LLM profiles. - Seed default local vLLM profiles from infra/vllm.

Phase 8: Marketplace Catalog and Trust¶

Goal: Make marketplace distribution trustworthy. - Implement real manifest and signature verification for .substrate bundles. - Support cloud-connected catalog sync and air-gapped direct upload.

Phase 9: Module Lifecycle and Isolated Runtime Execution¶

Goal: Execute connectors and policy packs outside the API process. - Separate control plane (catalog, installation) from runtime plane (isolated workers). - Implement full lifecycle state machine: installed, active, disabled, etc.

Phase 10: Billing, Licensing, and Entitlements¶

Goal: Support tenant and module licensing. - Implement platform tier and per-module entitlement enforcement. - Support offline license import and verification.

Phase 11: Deterministic Ingestion and Graph Population¶

Goal: Implement the event-driven ingestion foundation. - Add NATS-driven ingestion pipeline and Celery workers. - Implement deterministic connector base flow (GitHub, Git, Terraform). - Roll out GraphDelta write model and idempotent graph writes.

Phase 12: OPA Governance Foundation and Policy Packs¶

Goal: Turn policy runtime into a reliable governance subsystem. - Add reloadable bundle handling and exception sync into OPA data. - Connect policy evaluation runs to graph context and audit output.

Phase 13: Notifications and Live UI Events¶

Goal: Make notifications event-driven and user-scoped. - Add notification projector subscribed to NATS. - Implement SSE/WebSocket live stream for UI notifications and status events.

Phase 14: Seed Data and Demo Readiness¶

Goal: Make local/demo environments reflect the real platform model. - Seed Keycloak, SQL, and Neo4j with matching demo data. - Ensure built-in modules register automatically in local development.

Phase 15: Testing, Observability, and CI Hardening¶

Goal: Make the system safe to change. - Implement unit, contract, integration, and E2E tests for all critical flows. - Add structured logs, audit events, and health/readiness checks.

Phase 16: Extraction Readiness for Future Services¶

Goal: Keep the modular monolith ready to split into services. - Formalize event contracts for all domain changes. - Isolate service boundaries for identity, marketplace, graphrag, etc.