Analytics Engineering for AI Projects: Context as Infrastructure

David Effiong

Apr 21, 2026

•

min read

The Reader Has Changed

The biggest change in analytics isn’t the tools, it’s the reader.

Analytics engineering has always been about encoding knowledge. Every model you build; every metric you define; every column you name is a decision about how your business understands itself.

However, there was always an implicit assumption baked into that work: the reader is human who can read between the lines; ask a follow-up question; ping someone on Slack; and apply judgment when something looks off.

In the age of AI analytics, that assumption is broken. I have worked on building context for AI analytics agents, and I have found that you have to treat your context as infrastructure and your documentation as an engineering discipline.

The primary consumer of your data layer is increasingly an AI system, an analytics copilot, an agent answering business questions, a RAG pipeline grounding a language model in your company's numbers.

AI cannot infer intent. It will reason confidently from whatever context you give it, complete or not, accurate or stale, precise or vague.

The gap between "data that is correct" and "data that is correctly understood" has always existed.

For human readers, institutional knowledge and common sense bridged that gap sometimes; but for AI, that bridge doesn't exist. You have to build it.

What "AI-Ready" Actually Means for Analytics Engineers

For analytics engineers, AI-ready means something specific and foundational: can your data layer support reasoning and retrieval?

The distinction matters more than it might first appear.

Retrieval is getting the right number from the right place. A well-modeled warehouse, a clean semantic layer, or a reliable metrics store can solve these. Most modern data stacks are genuinely good at this now.‍
Reasoning is knowing what numbers mean, when to trust them, what they shouldn't be confused with, and when to flag uncertainty.

An AI system can deduce what the churn rate was last quarter, but to answer the why of that question, the system needs more than a metric: it needs a definition, whether that definition changed recently, whether there are known data quality issues in the given period, and even external factors (i.e., press releases or new Claude features). Without that context, it will still produce an answer, though maybe not an accurate one.

This is the problem analytics engineers need to solve:

‍context for AI reasoning. Two things sit at the center of that: the semantic layer and your model-level documentation. Both have existed for years and now have profound importance.

From BI Convenience to Reasoning Infrastructure

The semantic layer was originally a BI convenience. Define your metrics once, expose them consistently, stop having multiple definitions of revenue living in many different dashboards. That problem was real and the semantic layer solved it well.

A semantic layer built for BI and a semantic layer built for AI reasoning are not the same, even if they look identical in YAML. A BI-oriented semantic layer encodes what a metric is, how it's computed, what dimensions it can be sliced by, how entities join. It’s a formal specification with precision. It is intentionally narrow.

This is great when the goal is query resolution. However, this may not be the best bet when the goal is reasoning.

A BI semantic layer cannot tell AI that new_customer_revenue and expansion_revenue are deliberately separate metrics and that conflating them is a mistake that surfaces regularly in exec reviews.
It cannot tell it that the churn_rate definition changed in Q3 last year, and that comparisons crossing that boundary require an adjusted metric.
It cannot tell that this metric is unreliable on the first of the month due to how the billing system posts transactions.

And these are not necessarily edge cases. They are the reasoning context that separates a useful AI analytics layer from a confidently wrong one.

An AI-ready semantic layer goes even further.

It encodes metric relationships, how they relate, and where they are commonly misread. It carries temporal flags that mark when definitions changed and what that means for historical data.

Think of the semantic layer as a contract between your team and its AI agent. The contract tells an AI what something is, how it behaves, and what it guarantees. That contract needs to be far more complete than when the core consumer was human. To support AI analytics, your semantic layer must be rich in context and treated as infrastructure with clear instructions.

From Human Onboarding to Reasoning Substrate

Open almost any dbt project and look at the documentation. You'll find one of three things:

columns with no description at all
descriptions that restate the column name ("customer_id : the ID of the customer"),
Or, a handful of well-documented models that someone cared about once and nobody has touched since.

This reflects a rational response to incentives. Documentation was written for human onboarding, and humans are forgiving readers. They can fill gaps with inference, ask questions when confused, and learn fast.

Vague descriptions were a hygiene issue, not a crisis, and the cost of bad documentation was low.

That calculus has changed completely.

When an AI system reads your warehouse documentation, whether through a semantic layer integration or an analytics agent querying your metadata, it reads them as authoritative. And most times, instead of flagging missing descriptions, it fills them silently with its own inference. It does not ask you to clarify a vague description; it reasons confidently alongside everything else.

In the past, a missing description was a documentation debt in the backlog, but today it’s a reasoning gap for your agent.

A semantic layer can provide precise, narrow metric definitions, but your documentation can carry rich context: not only that, qualified_lead excludes trials under seven days, but this exclusion was a deliberate decision made after the sales team found that short trials never converted, and that before Q2 2025, the definition was different, and the historical data reflects that.

The shift this requires is not just in what you write but how you think about writing it.

Documentation written for a human reader can afford to be conversational, impressionistic, and incomplete. Documentation written for an AI consumer needs to be precise, structured, and treated as part of the model itself.

Caveats should be explicit, not implied.
Edge cases should be named, not assumed to be obvious.
Null behavior should be declared, not left for the reader to discover.

And critically, documentation needs to be versioned and maintained with the same discipline as model code. When a metric definition changes, the doc must change with it. When a known issue is resolved, the caveat must be removed.

‍Stale documentation for a human is mildly confusing. Stale documentation for an AI is confidently wrong reasoning, at scale, invisibly.

Your documentation was always a knowledge base. The question now is whether you are willing to engineer it like one.

Context as Infrastructure: Bridging The Gap

The analytics engineer has always been the bridge between raw data and business understanding. That role hasn't changed, but the audience has in many ways.

Institutional memory has to be encoded. The context that used to live in someone's head has to live in your models for AI to work well.

That is what context as infrastructure means: A deliberate, maintained, versioned layer of reasoning context that is treated with the same rigor as the data itself.

The teams that build this won't just have better AI analytics: they'll have better overall analytics. This leads to better-informed decision-making that will impact every aspect of their business, from product analysis to sales metrics.

The discipline of encoding reasoning context makes your data layer more honest and more trustworthy for human readers too. You can't write a precise caveat for an AI without first admitting the caveat exists.

The question is no longer "is our data correct?" It's "is our context complete?"

Share this post

Tag one