Most companies talk about “data governance” as if it is a separate program that lives in a committee. In reality, governance shows up first where data is touched, changed, and renamed.
That place is your ETL.
For this article, I am using “ETL” in the way many modern teams do: any pipeline that cleans, transforms, enriches, and reshapes data for reporting, analytics, operations, or downstream products. Not just moving data from A to B, but turning raw data into business meaning.
If you want a simple definition:
Data governance in ETL is the set of decisions and controls that ensure transformations are correct, traceable, safe, and aligned with how the business actually defines reality.
If that sounds abstract, good. Because ETL failures do not usually look like an error message. They look like confident dashboards that are quietly wrong.
Why governance matters more in ETL than anywhere else
Most business risk in data is created at the point of transformation.
Raw systems often have constraints, owners, and clear operational meaning. The moment a pipeline starts cleaning, mapping, joining, filtering, and redefining, you can accidentally create:
Metrics that do not match business definitions
Data that looks consistent but is semantically wrong
New fields added “wherever it is convenient”
Uncontrolled changes that break reporting over time
Data that becomes impossible to audit or explain
This is why you see the same pattern in many companies:
The business asks a simple question. The answer is fast. Then the next week the same question gets a different answer, and nobody can explain why.
That is not a tooling problem. That is a governance problem inside the ETL.
The most common ETL governance failures
1) Developers have too much power over business meaning
“We need a new variable, let’s add it wherever we want.”
The issue is not that developers are careless. The issue is that ETL becomes the place where business definitions get rewritten as implementation details.
When that happens, a field name becomes a promise, but nobody checks if the promise matches the business expectation.
2) Definitions and documentation are missing or inconsistent
If a metric or entity can be interpreted in multiple ways, it will be.
When definitions are not documented and owned, you will see arguments like:
“Units sold means shipped.”
“No, it means invoiced.”
“Actually it means contract signed.”
All of these can be valid definitions. The problem is calling them the same thing.
3) Business and engineering do not have end-to-end alignment
This is the silent killer. People assume the other side means the same thing.
Engineering builds something called “Active customer.” Business celebrates it. Then someone notices it excludes a segment they expected to be included.
No one intended to mislead. The meaning was never agreed.


