Dimensional Modeling: A Practical Guide for Data Engineers (2026)

    Dimensional modeling explained: fact and dimension tables, grain, conformed dimensions, and slowly changing dimensions (SCD). The Kimball method, applied with dbt.

    By Adriano Sanges--12 min read
    dimensional modeling
    data modeling
    kimball
    fact tables
    dimension tables
    slowly changing dimensions
    data engineering

    TL;DR: Dimensional modeling organizes warehouse data into fact tables (measurable business events) and dimension tables (the who/what/when/where context), optimized for fast, intuitive analytics. The first and most important decision is always the grain — what one row of a fact table represents. Get the grain, conformed dimensions, and slowly changing dimensions (SCD) right, and your warehouse stays fast and trustworthy as it grows.

    Dimensional modeling is the closest thing data engineering has to a timeless skill. Tools change — Hadoop to Spark, on-prem to cloud, hand-written SQL to dbt — but the discipline of structuring data into facts and dimensions has outlasted all of them, because it maps to how people actually ask business questions.

    Why OLTP Schemas Fail at Analytics

    Operational databases use normalized schemas (third normal form) to avoid redundancy and keep transactional writes consistent. That's correct for an app, and wrong for analytics. A question like "revenue by product category by month" can require joining many tables, and every join is a chance to get the grain wrong and silently double your numbers.

    Dimensional modeling reorganizes data for reading: few joins, fast aggregation, intuitive navigation.

    Facts and Dimensions

    Every dimensional model is built from two table types.

    Fact tables record measurable events — a sale, a click, a shipment. They contain numeric measures (amount, quantity) and foreign keys to dimensions. Facts are tall and narrow: many rows, few columns. Measures come in three flavors:

    • Additive: can be summed across any dimension (revenue).
    • Semi-additive: summable across some dimensions but not time (account balance).
    • Non-additive: can't be summed at all (ratios, percentages) — store the components, compute the ratio at query time.

    Dimension tables hold the descriptive context — customer, product, date, store. They are short and wide: fewer rows, many descriptive columns you filter and group by.

    Grain — The Most Important Decision

    Before writing any DDL, answer one question: what does one row of this fact table represent? That's the grain. "One row per order line." "One row per daily account balance." "One row per page view."

    Declaring the grain explicitly prevents the single most common (and most expensive) bug in analytics: fan-out. Join a fact to a dimension at the wrong grain and rows multiply, inflating every sum downstream. This is the silent killer covered in depth in SQL joins and GROUP BY warehousing pitfalls — and dimensional modeling is the structural defense against it.

    Star vs Snowflake

    Once you have facts and dimensions, you choose how to arrange dimensions.

    • A star schema keeps dimensions denormalized — one flat table per dimension. Simple joins, fast queries, the default for analytics.
    • A snowflake schema normalizes dimensions into hierarchies — less redundancy, more joins, more complexity.

    For most modern warehouses, star wins, because storage is cheap and query simplicity is valuable. The full comparison — with fact/dimension examples and the Kimball vs Inmon debate — is in star schema vs snowflake schema.

    Slowly Changing Dimensions (SCD)

    Dimensions change: a customer moves, a product is recategorized. How you handle that history is the SCD pattern.

    • Type 1 — overwrite: keep only the current value. Simple, but you lose history.
    • Type 2 — add a new row: version the dimension with valid_from/valid_to (and an is_current flag). This preserves full history and is the workhorse for "what did this look like at the time of the sale?"
    • Type 3 — add a column: keep a "previous" value alongside the current. Rare, for limited history.

    SCD Type 2 is where Change Data Capture shines: the before/after image of each captured change maps directly onto a new versioned row.

    Building It With dbt

    In practice you build dimensional models with dbt: staging models clean raw data, then dimension and fact models materialize the star schema, with tests on grain and referential integrity. Window functions do a lot of the heavy lifting — deduplicating sources, building SCD Type 2 valid-from/valid-to ranges, and computing running measures. The combination of dimensional modeling + dbt + tests is the backbone of modern analytics engineering.

    Frequently Asked Questions

    Kimball vs Inmon — which approach should I use?

    Kimball (bottom-up, dimensional marts with conformed dimensions) is the more pragmatic default for most teams and pairs naturally with dbt. Inmon (top-down, a normalized enterprise warehouse first) offers stronger central consistency but more upfront effort. Most modern teams lean Kimball.

    Do I still need dimensional modeling with a lakehouse or "one big table"?

    Yes. Even on a lakehouse, facts-and-dimensions thinking keeps models understandable and prevents fan-out. "One big table" (OBT) can be a convenient serving layer for a specific dashboard, but it's usually derived from a dimensional model, not a replacement for the discipline.

    What's the difference between a fact and a dimension again?

    A fact is a measurable event you aggregate (sales, clicks). A dimension is the context you slice and filter by (customer, product, date). If you'd SUM() it, it's probably a measure in a fact; if you'd GROUP BY it, it's probably a dimension attribute.

    How do I implement SCD Type 2 in dbt?

    Use dbt snapshots (which track changes over time automatically) or build the valid-from/valid-to ranges with window functions over a CDC or snapshot source. Test that no two current rows share a business key.

    About the Author

    Adriano Sanges is a data engineer and the creator of dataskew.io. He builds production data platforms with Airflow, dbt, Spark and cloud warehouses, and writes hands-on guides to help aspiring data engineers advance their careers.

    LinkedIn · Website

    Ready to Apply What You Learned?

    Take the next step in your data engineering journey with structured roadmaps and hands-on projects designed for real-world experience.