Data modeling
Joins

Joins

When your database tables relate to each other, you can represent those relationships directly in your data models. This allows you to build dashboards using data from multiple tables without writing manual JOIN statements.

Imagine the following database:

TableKey Relationships
customersA customer can have many orders.
ordersEach order is linked to one customer and one product.
productsEach product can appear in many orders.

With these relationships, you can answer questions like:

  • "How many orders did a specific customer place last month?"
  • "What’s the most popular product in a specific country this week?"
  • "Which customer has spent the most money?"

There are two ways to approach combining these tables:

1. One Model for Multiple Tables

You can use the sql parameter to combine tables. This approach is often very useful if multiple raw database tables need to be joined together to represent one clean user-facing concept.

cubes:
  - name: customers_and_orders_and_products
    sql: > 
      SELECT *
      FROM public.customers
      JOIN public.orders ON customers.id = orders.customer_id
      JOIN public.products ON products.id = orders.product_id
    ...

2. Multiple Models with Joins

The general rule of thumb is to try and have one data model for each user-facing concept.

E.g. Spotify would probably want a data model to represent each of the following concepts: playlists, listeners, artists, tracks, etc.

Since customers, orders, and products are user-facing concepts, the best approach is therefore to create separate models and then link them, like so:

cubes:
  - name: customers
    sql_table: public.customers
 
    joins:
      - name: orders # the name of the model you're joining to
        sql: "{CUBE}.id = {orders}.customer_id" # references the customer_id *column* in the orders model, but you can also reference dimensions (learn more below)
        relationship: one_to_many
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
    ...
cubes:
  - name: orders
    sql_table: public.orders
 
    joins:
      - name: products # the name of the model you're joining to
        sql: "{CUBE}.product_id = {products}.id" # references the id column in the products model, but you can also reference dimensions (learn more below)
        relationship: many_to_one
      - name: customers # the name of the model you're joining to
        sql: "{CUBE}.customer_id = {customers}.id" # references the id column in the customers model, but you can also reference dimensions (learn more below)
        relationship: many_to_one
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
    ...
cubes:
  - name: products
    sql_table: public.products
 
    joins:
      - name: orders
        sql: "{CUBE}.id = {orders}.product_id" # references the product_id column in the orders model, but you can also reference dimensions (learn more below)
        relationship: one_to_many
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
    ...
  • name: In a joins block, name refers to another model (cube), not the underlying database table.

  • sql: Specifies the SQL condition that defines how two models are related. You can reference either columns or dimensions:

    • {products}.some_column points to the some_column field in the database table associated with the products model.
    • {products.some_dimension} refers to the some_dimension dimension that’s defined inside the products model.
    • my_products.some_column accesses the some_column in the actual my_products table in your database (i.e. the raw table name).
  • relationship: Set relationship to one_to_one (opens in a new tab), one_to_many (opens in a new tab), or many_to_one (opens in a new tab) depending on your schema.

  • primary_key: Each model should have a dimension marked primary_key: true. This ensures Cube can generate valid JOIN queries. By default, primary keys are hidden in the UI - use public: true if you want to expose them as dimensions. If no single column uniquely identifies rows, build a composite key with a custom SQL expression:

    dimensions:
      - name: id
        sql: CONCAT_WS('|', tableA.id, tableB.id)
        type: string
        primary_key: true

Joining data from multiple data sources

You can also join data stored in different data sources. Visit Cube's documentation (opens in a new tab) to learn more.