Examples: from SQL to Models
In this section, we walk through examples that illustrate how to make the shift from writing SQL to modeling.
Here's a quick mental model to keep in mind:
- SQL asks: “How do I compute this number right now?”
- Modeling asks: “What entities and metrics should always be available so my team, users, or customers can compute this (and related) numbers later, consistently?”
You can jumpstart modeling by clicking the Ask AI button (in the bottom right) and pasting in your SQL queries.
Simple KPI examples
Let's start with a small SaaS schema and simple KPIs to help you understand the base components.
Example schema
Let's say we have a database with 4 tables:
- customers:
id
,name
,email
,country
,created_at
- orders:
id
,customer_id
,status
,total_amount
,created_at
- subscriptions:
id
,customer_id
,status
,plan_interval
('month'|'year'
),monthly_price
,started_at
,canceled_at
- events (app usage):
id
,customer_id
,event_type
,occurred_at
And, we're interested in KPIs like:
- New signups
- Active customers / usage over time
- Completed orders & revenue (e.g., total revenue, revenue by country)
- “Simple MRR” (sum of
monthly_price
for active subscriptions)
KPI: Completed revenue by customer
Here's the SQL you might write:
SELECT
c.name,
SUM(o.total_amount) AS total_revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
GROUP BY c.name
ORDER BY total_revenue DESC;
And, here's that SQL translated into reusable data models:
customers.cube.yml
cubes:
- name: customers
sql_table: public.customers
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: customer_name
sql: name
type: string
- name: country
sql: country
type: string
- name: created_at
sql: created_at
type: time
measures:
- name: customers_count
type: count
title: "# of Customers"
orders.cube.yml
cubes:
- name: orders
sql_table: public.orders
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: status
sql: status
type: string
- name: created_at
sql: created_at
type: time
- name: total_amount
sql: total_amount
type: number
measures:
- name: orders_count
type: count
title: "# of Orders"
- name: total_revenue
sql: total_amount
type: sum
title: "Total Revenue"
filters:
- sql: "{CUBE}.status = 'completed'"
joins:
- name: customers
sql: "{CUBE}.customer_id = {customers}.id"
relationship: many_to_one
- These models enable us to recreate the “revenue by customer” query in Embeddable’s builder simply by selecting the
total_revenue
measure and thecustomer_name
dimension. - But the real power, along with centralizing our data logic, is that they unlock many other questions without writing new SQL:
- Revenue by country.
- Order volume trends over time
- Top customers by spend in the last 30 days
- Average order value per region.
Things to notice
- No aggregations in the cube SQL - the model definitions don’t SUM or GROUP BY. Aggregations come from measures (e.g. total_revenue with type: sum) and grouping comes from dimensions (e.g. customer_name) you select in the builder.
- Filters aren’t baked into the base dataset - instead of
WHERE o.status = 'completed'
in the cube SQL, the example uses a filtered measure. - Joins are modeled once, then reused - the orders model declares a single join to customers. After that, any chart can use customer fields (customer_name, country) without rewriting the join.
- Sorting is a visualization concern - the original SQL orders by
total_revenue DESC
. In Embeddable, you sort when needed in the builder, rather than hard-coding it in the model. - The examples above use
sql_table
- these are simple fact table examples. You usesql
if your model relies on CTEs/window functions or you want a denormalized base.
KPI: Revenue by country (last 30 days)
Here's the SQL you might write:
SELECT
c.country,
SUM(o.total_amount) AS revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
AND o.created_at >= NOW() - INTERVAL '30 days'
GROUP BY 1;
And, you can use exactly the same models as with the first example (above), selecting the relevant properties in Embeddable's builder:
- Measure:
orders.total_revenue
- Dimension:
customers.country
- Time window: filter
orders.created_at
to last 30 days in the builder
KPI: Simple MRR (active subscriptions)
Here's the SQL you might write:
SELECT
SUM(s.monthly_price) AS simple_mrr
FROM subscriptions s
WHERE s.status = 'active';
And, here's that SQL translated into reusable data models:
subscriptions.cube.yml
cubes:
- name: subscriptions
sql_table: public.subscriptions
joins:
- name: customers
sql: "{CUBE}.customer_id = {customers}.id"
relationship: many_to_one
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: status
sql: status
type: string
- name: plan_interval
sql: plan_interval
type: string
- name: started_at
sql: started_at
type: time
- name: canceled_at
sql: canceled_at
type: time
measures:
- name: simple_mrr
sql: CASE WHEN status = 'active' THEN monthly_price ELSE 0 END
type: sum
title: "Simple MRR"
Essentially, with the model, “Simple MRR” becomes a reusable measure you can break down by country, plan interval, and cohorts without re-authoring SQL.
Summary
- In SQL queries, you hard-code the joins, filters, and aggregation for that one question.
- Output is a single result set (great for ad-hoc or one chart).
- In models, you define reusable semantics: dimensions (fields), measures (aggregations), and joins.
- Output is a flexible dataset you can slice by time, country, product, etc, without new SQL.
Modeling wins because it gives you:
- Reusability: Centralize logic into measures and reuse across dashboards.
- Consistency: Everyone answers KPIs from the same definitions.
- Composability: Build derived cubes for complex KPIs; keep base cubes broad.
- Governance & speed: Easier reviews, CI/CD, and safer iteration.