TL;DR: Dimensional modeling organizes warehouse data into fact tables (measurable business events) and dimension tables (the who/what/when/where context), optimized for fast, intuitive analytics. The first and most important decision is always the grain — what one row of a fact table represents. Get the grain, conformed dimensions, and slowly changing dimensions (SCD) right, and your warehouse stays fast and trustworthy as it grows.
Dimensional modeling is the closest thing data engineering has to a timeless skill. Tools change — Hadoop to Spark, on-prem to cloud, hand-written SQL to dbt — but the discipline of structuring data into facts and dimensions has outlasted all of them, because it maps to how people actually ask business questions.
Why OLTP Schemas Fail at Analytics
Operational databases use normalized schemas (third normal form) to avoid redundancy and keep transactional writes consistent. That's correct for an app, and wrong for analytics. A question like "revenue by product category by month" can require joining many tables, and every join is a chance to get the grain wrong and silently double your numbers.
Dimensional modeling reorganizes data for reading: few joins, fast aggregation, intuitive navigation.
Facts and Dimensions
Every dimensional model is built from two table types.
Fact tables record measurable events — a sale, a click, a shipment. They contain numeric measures (amount, quantity) and foreign keys to dimensions. Facts are tall and narrow: many rows, few columns. Measures come in three flavors:
- Additive: can be summed across any dimension (revenue).
- Semi-additive: summable across some dimensions but not time (account balance).
- Non-additive: can't be summed at all (ratios, percentages) — store the components, compute the ratio at query time.
Dimension tables hold the descriptive context — customer, product, date, store. They are short and wide: fewer rows, many descriptive columns you filter and group by.
Grain — The Most Important Decision
Before writing any DDL, answer one question: what does one row of this fact table represent? That's the grain. "One row per order line." "One row per daily account balance." "One row per page view."
Declaring the grain explicitly prevents the single most common (and most expensive) bug in analytics: fan-out. Join a fact to a dimension at the wrong grain and rows multiply, inflating every sum downstream. This is the silent killer covered in depth in SQL joins and GROUP BY warehousing pitfalls — and dimensional modeling is the structural defense against it.
Star vs Snowflake
Once you have facts and dimensions, you choose how to arrange dimensions.
- A star schema keeps dimensions denormalized — one flat table per dimension. Simple joins, fast queries, the default for analytics.
- A snowflake schema normalizes dimensions into hierarchies — less redundancy, more joins, more complexity.
For most modern warehouses, star wins, because storage is cheap and query simplicity is valuable. The full comparison — with fact/dimension examples and the Kimball vs Inmon debate — is in star schema vs snowflake schema.
Slowly Changing Dimensions (SCD)
Dimensions change: a customer moves, a product is recategorized. How you handle that history is the SCD pattern.
- Type 1 — overwrite: keep only the current value. Simple, but you lose history.
- Type 2 — add a new row: version the dimension with
valid_from/valid_to(and anis_currentflag). This preserves full history and is the workhorse for "what did this look like at the time of the sale?" - Type 3 — add a column: keep a "previous" value alongside the current. Rare, for limited history.
SCD Type 2 is where Change Data Capture shines: the before/after image of each captured change maps directly onto a new versioned row.
Building It With dbt
In practice you build dimensional models with dbt: staging models clean raw data, then dimension and fact models materialize the star schema, with tests on grain and referential integrity. Window functions do a lot of the heavy lifting — deduplicating sources, building SCD Type 2 valid-from/valid-to ranges, and computing running measures. The combination of dimensional modeling + dbt + tests is the backbone of modern analytics engineering.
Frequently Asked Questions
Kimball vs Inmon — which approach should I use?
Kimball (bottom-up, dimensional marts with conformed dimensions) is the more pragmatic default for most teams and pairs naturally with dbt. Inmon (top-down, a normalized enterprise warehouse first) offers stronger central consistency but more upfront effort. Most modern teams lean Kimball.
Do I still need dimensional modeling with a lakehouse or "one big table"?
Yes. Even on a lakehouse, facts-and-dimensions thinking keeps models understandable and prevents fan-out. "One big table" (OBT) can be a convenient serving layer for a specific dashboard, but it's usually derived from a dimensional model, not a replacement for the discipline.
What's the difference between a fact and a dimension again?
A fact is a measurable event you aggregate (sales, clicks). A dimension is the context you slice and filter by (customer, product, date). If you'd SUM() it, it's probably a measure in a fact; if you'd GROUP BY it, it's probably a dimension attribute.
How do I implement SCD Type 2 in dbt?
Use dbt snapshots (which track changes over time automatically) or build the valid-from/valid-to ranges with window functions over a CDC or snapshot source. Test that no two current rows share a business key.