Data & AI Governance
Issue 01 · A Story in Seven Parts
The Brief For BI Leads & Business Owners
7 min read · Scroll to begin
§ 00 The Opening

Your data platform
has a trust problem.

Reports open. Dashboards refresh. The numbers show up on time. And yet — every meeting, someone asks the same question: “where does this number actually come from?”

This is not a tooling problem. Your platform is modern. Fabric, Power BI, the lakehouse pattern — all in place. What's missing is the layer underneath: ownership, contracts, lineage, a shared language. Without it, every new question becomes a manual investigation, every AI project stalls on data quality, and every decision carries a hidden tax of doubt.

The good news: the fix is well-understood, pays back fast, and you can start in a quarter. The next seven sections show how.

§ 01 The Symptoms

Seven familiar symptoms of ungoverned data.

Individually, none of these feels like a crisis. Together, they are the reason AI initiatives stall and board reports get questioned line by line.

Symptom 01 · Ownership

Data lives under Finance. By accident.

Data ownership ended up under the team that reports numbers, not the teams that produce them. When Operations needs a change, the queue is long and the incentives are wrong.

“We don't know who to ask — so we ask no one.”
Symptom 02 · Structure

Fact and dimension tables, entangled.

Source systems (SAP S/4HANA, ERP, CRM) land raw in the gold layer. No modeling, no domain logic — just tables that happen to join if you know the right keys.

“It works. But only Antti knows why.”
Symptom 03 · Lineage

No one can trace a number to its origin.

An end-user questions a figure in a Power BI report. To answer, someone opens a laptop, three SQL editors, two pipelines, and a Teams chat from last March.

“Give me a day — I'll get back to you.”
Symptom 04 · Quality

“The number is wrong” — again.

Nothing is checked before it reaches the report. Quality is discovered by the business user who sees a spike, by the controller who closes the month, by the CFO in the board meeting. Trust erodes with every surprise.

“Let me check with the team and get back to you on Monday.”
Symptom 05 · Documentation

The knowledge lives in people, not systems.

There is no catalog, no glossary, no schema registry. The only documentation is oral — a conversation with the two people who built it. When they change roles, the knowledge leaves the building.

“We'll write it down after this sprint.”
Symptom 06 · Architecture

Medallion in name, muddle in practice.

Bronze, silver, gold — on the architecture slide. In reality: raw tables promoted straight to gold, unclear responsibilities between engineering and BI, no CoE to hold the line.

Symptom 07 · AI

AI ambition, blocked by the basics.

Every AI pilot hits the same wall: unclear ownership, unverified quality, no lineage, no consent model. The POC works. The production system waits.

Symptom 08 · Discovery

Reports that nobody can find.

Dozens of dashboards across workspaces. No one knows which is canonical, which is a personal experiment, or which was decommissioned two years ago and still refreshes.

§ 01a A concrete case

Raw ERP data, promoted straight to the gold layer.

Here's what the BI team inherits: tables named after source-system objects, dimensions mixed into fact rows, no conforming keys, no business glossary. Every report writer solves this privately, in their own DAX.

GOLD.sap_vbak_vbap_rawUngoverned
VBELN
MATNR
KUNNR
NETWR
0000014221
MT-1042
C-88-X
1 240,00
0000014222
MT-1042
C-88-X
1 240,00
0000014223
SVC-TR-01
C-88-X
0,00
0000014224
C-92-Y
890,50

// Fact and dim columns co-mingled. No lineage. No owner.
// NETWR mixes gross/net. NULL MATNR is a service line.
// Every report re-invents the same five joins.

sales.fact_order_line · v2.1Contracted
order_id
product_sk
customer_sk
net_amount_eur
O-14221
P-042
C-1038
1 240,00
O-14222
P-042
C-1038
1 240,00
O-14223
P-SVC-01
C-1038
0,00

// Owner: Sales Domain · SLA: T+1h, 99.5%
// Schema v2.1 · Lineage tracked · Contract active
// One conformed fact. One glossary definition. No ambiguity.

§ 02 The cost of inaction

Ungoverned data is not free.
It's expensive — you just don't get an invoice.

The costs are hidden in hours, delayed decisions, shelved AI projects, and lost commercial opportunity. Put conservative numbers on each line and the annual total surprises everyone — including the CFO.

#
Hidden cost
What it looks like in practice
Annual impact*
01
Manual lineage hunts
Every time someone asks “where does this number come from?”, a BI developer spends 2–6 hours tracing it. Multiply by ~3 questions per week.
€120–180k~1 FTE equivalent
02
Duplicate reporting work
Without certified products, each team rebuilds the same metric a slightly different way. Three versions of revenue. Four versions of “active customer”.
€200–300kAnalyst time & rework
03
AI projects stalled
Pilots complete; production waits 6–12 months on data-readiness issues that governance would catch up-front. Opportunity cost of every delayed use case.
€500k–1.5MPer major use case
04
Decisions on wrong data
Pricing, demand planning, margin reports — built on data with undetected quality issues. One miscalled forecast can erase a year of platform savings.
€300k–2MVariable · high-tail
05
Key-person risk
When undocumented knowledge sits with two or three people, every parental leave, resignation, or sick day is a business-continuity event.
€80–150kOnboarding & recovery
06
Slow reaction to change
When pricing, a supplier, or a regulation changes, it takes weeks — not days — to understand the impact across reports. Competitors move first.
€250–500kMargin leakage
Conservative annual drag
€1.4–4.6M

*Illustrative range for a mid-market organisation with a Fabric/Power BI stack and 100+ report consumers. Actual numbers are calibrated per customer during the Maturity Evaluation.

§ 03 The turn
Governance is not a committee. It's not a document. It's an operating model where every piece of business-critical data has an owner, a contract, and a traceable path from source to decision. The working definition we use
§ 04 The operating model

Five layers. Built in order. Each one unblocks the next.

You don't buy governance as a tool. You build it as a stack, from people to product. Skip a layer and the ones above it wobble.

Layer 01 Ownership & Data Domains

Every dataset gets a named business owner — in the domain that produces it, not the team that reports it. Sales data is owned by Sales. Supply data by Supply. Finance becomes a consumer, not the dumping ground.

Domain map RACI Data Office charter Business Glossary
Layer 02 Data Contracts

A contract freezes the promise: schema, freshness, quality thresholds, owner, support. Producers commit; consumers trust. Breaking the contract is a real event — with an owner, an alert, and an SLA — not a surprise on Monday.

YAML spec Schema & SLA Automated checks Contract registry
Layer 03 Catalog & Metadata Management

One place to discover what exists, who owns it, how fresh it is, how good it is, and how to get access. Certified data products surface to the top; experimental ones are clearly marked.

Microsoft Purview Unity Catalog Glossary Certification badges
Layer 04 Lineage & Documentation

Every field traceable, end-to-end: from the source system, through the medallion layers, into the semantic model, onto the dashboard. When someone asks “where does this come from?” — the catalog answers in seconds.

Column-level lineage Impact analysis Auto-docs Change log
Layer 05 Platform CoE & Operating Model

A small, central team of experts — not a gatekeeper, an enabler. Sets the standards, runs the catalog, trains domains, and keeps the platform evolving in one direction rather than five.

Central enablement Federated domains Standards & templates AI Governance
§ 04a The keystone

The Data Contract is where ownership, quality, and trust become concrete.

One signed YAML per data product. It says who owns it, what's in it, how fresh it has to be, what quality it meets, and what happens when those promises break. This is the artifact that turns “governance” from a slide into a system.

Data Contract · Active

sales.fact_order_line

Owner (Producer) Sales Domain · Head of Revenue Operations
Primary consumers Finance · Pricing · Commercial Analytics · Forecasting ML
Refresh SLA Hourly · 99.5% availability · incident response within 1h
Quality thresholds Completeness ≥ 99% · Accuracy ≥ 99.5% · Timeliness ≤ 60 min
Lineage ERP → bronze.vbak/vbap → silver.orders → sales.fact_order_line
Classification Internal · No PII · Retention 7y
# data-contract/v2.1
name: sales.fact_order_line
version: 2.1.0
owner:
  domain: sales
  contact: sales-data@company.fi
schema:
  - name: order_id         type: string   required: true
  - name: product_sk       type: string   required: true
  - name: customer_sk      type: string   required: true
  - name: net_amount_eur   type: decimal  required: true
  - name: booked_at        type: timestamp
sla:
  freshness_min: 60
  availability: 99.5
  completeness: 99.0
quality_checks:
  - net_amount_eur >= 0
  - order_id is unique
  - customer_sk resolves in dim_customer

Today · lineage on request

After · lineage on demand

§ 05 Where are you today?

Five stages of data maturity. Most organisations sit at 2 or 3.

Pick where you are today and where you want to be in twelve months. The gap is the brief for the first phase of work.

You are here
Stage 2 · Reactive
Target in 12 months
Stage 4 · Managed
§ 05a The path

A pragmatic 12-month roadmap, value in every quarter.

Quarter
Focus
What you can show the board
Q1
Maturity evaluation & domain pilot
Baseline against a 5-stage framework. Pick one domain (e.g. Sales). First owners, first glossary, first contract — all inside 8 weeks.
Q2
Governance playbook & catalog live
Playbook adopted by the Data Office. Catalog populated with pilot-domain products. Lineage visible end-to-end for the pilot.
Q3
Scale to 2–3 more domains
Domain template reused. Platform CoE operating. First AI use case running on contracted data — with governance built in, not bolted on.
Q4
Operate, measure, optimise
KPIs in place: time-to-trace, time-to-trust, incidents prevented, hours saved. Governance reported to the executive every quarter.
§ 06 What it's worth

Governance pays for itself. Three ways.

The investment is modest — a handful of consulting days, a catalog licence, discipline. The returns show up on three separate lines of the P&L.

Saving · 01

Hours reclaimed from manual investigation.

€300k+

BI developers stop chasing lineage. Analysts stop rebuilding the same metric. A conservative estimate for a 100-person data community is 1–2 FTE returned to higher-value work.

Saving · Year 1
Saving · 02

Decisions made on data you can trust.

€500k–2M

One miscalled pricing change, one wrong demand forecast, one regulatory miss — any of them dwarfs the cost of governance. Prevention is the category.

Avoided cost · variable
Revenue · 03

AI use cases that actually reach production.

€1M+

The single biggest unlock. Governed data is a prerequisite — not a nice-to-have — for AI at scale. One productionised use case typically covers the cost of the whole programme, several times over.

Revenue · per use case

Size it for your organisation

BI & analytics people 40
Hours lost per person per week · to lineage & rework 4 h
Avg. loaded hourly cost 85
AI use cases waiting on governance 3
Estimated annual value
€1.9M

Conservative year-one impact of closing the governance gap. Combines recovered hours, prevented decision errors, and unblocked AI value.

Hours reclaimed €708k
Decision quality €400k
AI unblocked €750k
§ 07 The next step

Ninety days.
One domain.
A story you can tell the board.

We propose a short, concrete first step: a maturity evaluation and a single-domain pilot. Inside a quarter you'll have a baseline, your first contracts, a working catalog entry, and — most importantly — a narrative that travels upward.