Defining Data Models
You can define data models using either yaml or javascript. Generally, yaml is recommended due to its simplicity and readability. On this page, we focus exclusively on yaml (see Cube’s documentation (opens in a new tab) for more information on using JavaScript-based models)
In Embeddable, each data model is defined in its own .cube.yml
file, and begins with the cubes:
keyword.
# src/models/customers.cube.yml
cubes:
- name: customers
title: My customers
data_source: default
sql_table: public.customers
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: email
title: 'Email address'
sql: email
type: string
- name: country
title: 'Country of origin'
sql: country
type: string
- name: signed_up_at
sql: signed_up_at
type: time
- name: full_name
title: "Full name"
sql: CONCAT(first_name, ' ', last_name)
type: string
measures:
- name: count
type: count
title: "# of Customers"
joins:
- name: orders # the name of the data model to join to (not the table)
sql: '{CUBE}.id = {orders}.customer_id'
relationship: one_to_many
You can see more data model examples inside your repo (opens in a new tab).
YAML is sensitive to indentation. Use an online YAML validator like YAML Lint (opens in a new tab) to check your formatting.
Core Parameters
-
name
: A unique identifier for the model (no spaces or special characters).Name
is used internally by Embeddable and Cube to reference the model. Updating it once it's in use can potentially break your dashboards. -
title
: An optional, human-readable name that appears in the UI. You can change it anytime without impacting the underlying logic. -
data_source
: Specifies which data source the model belongs to. This should match the connection name you defined when connecting your database. -
sql_table
: Points to the actual table or view in your database (e.g.,public.customers
). Use this for straightforward mappings. -
sql
: An alternative tosql_table
if you need to define a custom query rather than referencing a single table/view. For example:sql: > SELECT id, name, created_at FROM public.customers WHERE status = 'active'
A common misconception is that you have to write GROUP BY statements in your
sql
. Instead, usesql
to define a base table and then separately define dimensions and measures. The semantic layer automatically generates the necessary SQL, including GROUP BY clauses, behind the scenes.
Other Parameters
You can also define other parameters, including joins between models and pre-aggregations to optimize performance.
-
joins
: Defines how to link datasets and handle join relationships. -
description
: A short note or summary of what the model represents or how it should be used. Useful for keeping track of intentions or business context. -
preAggregations
: Optional configurations for creating and managing materialized views or rollups. This helps improve performance by caching common aggregations. For instance:preAggregations: mainRollup: type: rollup measureReferences: - count dimensionReferences: - country timeDimensionReference: signed_up_date granularity: day
These parameters allow you to tailor each data model to match your analytics needs, whether you’re simply mapping a table or performing advanced transformations and aggregations.