From concept to working POC
An earlier essay published in this blog made a conceptual claim: that the next frontier of enterprise AI is not conversing with documents, but conversing with structured data in a governed, reliable and decision-ready way. It argued that the bottleneck blocking most organizations from being truly data-driven is not the absence of data — it is the cognitive rupture between the moment a question arises and the moment reliable evidence reaches the person deciding.
That essay defined the problem and the architecture. It did not build anything.
This post is the proof.
We built a minimal proof of concept in a real enterprise retail domain. The system receives questions in natural language, routes them through a governed analytical pipeline, and returns qualified responses backed by institutional knowledge — with explicit indication of the authority level behind each answer.
The goal was not to build a general-purpose Conversational BI system. It was narrower and more honest than that:
The post that follows is not a repeat of the earlier essay's argument, nor documentation of the codebase. It is an account of the architectural decisions made, the problems encountered, and the degree to which the implementation validates the claims of the essay that preceded it.
The system was validated on production retail data. The examples, traces and figures in this post use a synthetic electronics dataset — smaller in scale, designed for reproducibility and open exploration. The architectural claims hold in both cases. Some operational complexity that motivates specific design decisions — in particular the multi-pass SKU filtering — is more visible at production scale.
A system organized around a separation
The system rests on one deliberate structural decision: the conversational layer and the analytical authority layer are strictly separated.
The conversational shell manages the interaction with the user — it holds context, surfaces results, presents answers. In this POC, that role is implemented by Claude Desktop. But regardless of implementation, the shell does not own the analytical model. It does not contain the product ontology. It does not define the metrics. It does not decide what SQL should be trusted.
All of that authority lives in the backend.
The system has five structural layers. Each has a defined role and a concrete implementation in the POC:
The nested structure makes the separation concrete: each layer contains the one below it and constrains what that layer can see and do. The semantic knowledge base is what the analytical pipeline consults — not what it executes. Claude Desktop is the outermost layer, the only one that touches the user in both directions.
This is what a governed architecture looks like in practice. Not a system where the model is instructed to behave correctly, but one where the structure itself enforces the boundaries.
Claude Desktop is the conversational shell used in this POC, but the architecture does not depend on it. Any conversational client with MCP support and the ability to operate under a structured orchestration protocol could take its place. The governed backend and the orchestration protocol are the architecturally essential elements. The shell is replaceable.
A pipeline where every step earns its place
A question enters the system as natural language and exits as a qualified analytical answer. Between those two points, the workflow is not a sequence of tasks — it is a system of control points. Every step can stop the pipeline, reject the question, or qualify the response. That capacity to halt, reject and qualify is what makes the workflow governed rather than merely structured.
The workflow has seven logical stages but exposes fewer tools to the conversational shell. That asymmetry is intentional — it is the Orchestration Protocol in action. Steps S3 through S4 are grouped inside a single backend tool precisely because they are the most sensitive: semantic mapping, live SKU resolution, SQL generation and response-authority logic all stay inside the governed backend, invisible to the shell.
This design changes the role of the language model. The LLM is not asked to answer the business question — it is asked to participate in a controlled analytical process where each step has a defined jurisdiction. That is what distinguishes this system from a chatbot with SQL access.
For the full algorithmic detail of what happens inside each step — including decision logic, model calls, and failure modes — see Supplementary Material A: Pipeline Deep Dive →
Institutional knowledge made executable
A governed analytical system requires an explicit epistemic architecture — a body of knowledge organized into layers, each corresponding to a fundamental dimension of the analytical universe of the organization. This section shows how that architecture became concrete. Each conceptual layer has a direct counterpart in the repository: a file, a structure, a set of definitions that the system consults at runtime.
Without those assets, the system has no epistemic architecture. It has a language model with access to a SQL interface.
- Definition and classification of existing products
- Node metadata for LLM navigation without hallucination
- Active in S3 — ontology mapping
- Definition of computable metrics and their properties
- Temporal nature (flow vs. stock) and multilingual aliases
- Active in S2 — metric validation
- 68 verified question-SQL pairs
- Institutional memory for certified query patterns
- Active in S4 — template matching and SQL instantiation
- Map of available tables and columns
- Explicit exclusion of non-existent tables
- Active in S4 — SQL generation at LOW confidence
- Translation of terms to ontology nodes across Spanish, English and retail jargon
- Deterministic fallback when LLM returns zero matches
- Active in S3 — ontology mapping
The epistemic architecture is the mechanism that moves analytical knowledge out of people's heads and into the system, where it can be audited, versioned and evolved independently of the individuals who created it.
Maintaining that layer is a continuous analytical responsibility. The ontology grows as the catalogue grows. The TLK expands as new classes of questions are validated. The metric definitions evolve as the organization's measurement practices evolve. In that sense, the epistemic architecture redefines the role of the analyst — from operational intermediary to curator of institutional knowledge.
05 · Response authorityConfidence regimes: knowing what you know
The epistemic architecture defines what the system knows. The confidence regimes define how honestly the system communicates what it knows. A system that presents every response with the same degree of confidence is not being transparent — it is deferring a judgment to the user without giving them the information they need to make it.
The POC implements four response-authority states. Each is a direct consequence of how the SQL was produced:
item_names.item_names after the SKU filter. Query rejected before SQL generation.Two distinctions matter here. LOW does not mean wrong. It means the answer has not been validated against a certified institutional pattern — which is a fundamentally different thing. A LOW response can be correct; it simply has not earned institutional authority. BLOCKED is not an error. It is the system asserting that executing the query without a governed product scope would be analytically unsafe.
06 · End-to-end traceTwo questions. Two different answers.
The two questions below were chosen deliberately. The first has a direct match in the TLK library — the system answers it with full institutional authority. The second requires a derived ratio that no certified template covers — the system answers it, but qualifies that answer honestly. Same architecture. Same pipeline. Different epistemic authority.
The contrast between the two traces is not incidental. It is what the architecture is designed to produce: different answers carry different authority, and the system makes that difference visible. The ACCURATE trace demonstrates what the system can do at its best — deterministic, verified, institutional. The LOW trace demonstrates what the system does when it reaches the boundary of its certified knowledge — it continues, but it is honest about what lies beyond that boundary.
07 · ConclusionWhat the POC actually proves
The earlier essay argued that the first wave of enterprise generative AI made unstructured knowledge conversational, and that the next challenge is making structured data conversational without sacrificing analytical truth. This proof of concept is the test of that claim in a real enterprise domain.
This POC shows that the problem cannot be solved by simply connecting a chatbot to a database. Natural language access to structured data becomes useful only when it is governed by explicit semantics, validated workflows, certified analytical patterns and visible response authority. The key to that governance is architectural: separating the conversational interface from the analytical authority, so that the language model orchestrates interaction while the institution retains ownership of truth.
The key architectural lesson is clear:
- The LLM should orchestrate the interaction.
- The institution should own the truth.
- The backend should operationalize that truth.
- The user should see the authority level of every answer.
This is the difference between conversational analytics and decision-grade Conversational BI.
The proof of concept described here is not the final form of the architecture. It is a controlled first materialization of its logic. It demonstrates that governed semantics, enforced workflows and natural-language interaction can coexist. More importantly, it shows that building governance into the system from the beginning is easier — and safer — than trying to retrofit it after the system has already learned to answer freely.
The frontier is not simply giving every manager the ability to "chat with data". The frontier is giving every manager access to governed evidence at the moment of decision.
That is the real promise of Conversational BI.
The proof of concept is not the final form of Conversational BI. It is a controlled first materialization of its architectural logic.
Decision-grade conversational analytics depends less on unrestricted generation than on governed semantics, enforced workflow and qualified response behaviour.
Defines the conceptual thesis: why governed conversational access to structured data is the next frontier of enterprise AI.
Translates the thesis into a working proof of concept and documents the architectural decisions behind it.
Exposes the technical artifact: the governed pipeline, semantic assets and workflow logic, open for inspection and reuse.