Prerequisites for pre-aggregations
Before enabling pre-aggregations, it’s recommended to review the performance of these queries. In many cases, straightforward database optimisations are sufficient, and pre-aggregations may not be required.
This guide highlights the areas that most commonly affect query performance and are worth reviewing before introducing pre-aggregations.
Indexes
Indexes are the single most effective way to improve query performance. They allow the database to locate rows efficiently without scanning the entire table.
What is an index?
Without an index, the database scans every row to find matches:
events table
-------------------------
id | account_id | status
1 | 42 | active
2 | 17 | active
3 | 42 | inactive
4 | 99 | activeWHERE account_id = 42Things to notice:
-
Every row is checked.
-
With an index on
account_id, the database uses a lookup structure to jump directly to matching values instead of scanning rows one by one. -
Indexes are sorted, which allows the database to navigate efficiently.
Conceptual example
Assume an index on account_id with these values:
12, 17, 42, 42, 58, 73, 99 [42]
/ \
[17] [73]
/ \ / \
[12] [42] [58] [99]WHERE account_id = 58-
The database follows the sorted structure and reaches
58in a small number of comparisons. -
The database does not scan values sequentially.
Why this matters
- Full table scans check every row
- Index lookups follow a small number of comparisons
- Lookup cost remains predictable as data grows
Which columns should be indexed?
Index columns commonly used in:
Filters
WHERE account_id = 42 AND status = 'active'Index:
account_idstatus(if selective)
Joins
Always join on indexed columns.
JOIN users ON events.user_id = users.idIndex:
events.user_idusers.id
Grouping
GROUP BY account_id, event_dateIndex:
account_idevent_date
Composite indexes
If filters commonly appear together:
WHERE account_id = 42 AND created_at >= ?Create a composite index:
(account_id, created_at)Column order should match common filter patterns.
Index pitfalls
Too many indexes
- Inserts and updates become slower
- Storage usage increases
Each index must be maintained as data changes.
Indexing low-value columns
Indexes are effective only when they significantly reduce the number of rows scanned.
Avoid indexing columns with very few distinct values, such as:
- Boolean flags
- Status columns where most rows share the same value
If a filter matches most rows, an index adds overhead without improving performance.
Breaking index usage
Functions and casts prevent index usage as it changes the indexed value, forcing the database to evaluate every row instead of using the index.
WHERE CAST(user_id AS INTEGER) = 42WHERE LOWER(email) = 'test@example.com'Primary key vs explicit indexes
- Primary key index
- Created automatically
- Optimised for point lookups
- Rarely helps aggregation queries
- Explicit indexes
- Created using
CREATE INDEX - Used for filters, joins, and groupings
- Created using
Pre-aggregations typically benefit from explicit indexes, not from the primary key.
Partitioning Large Tables
Partitioning becomes useful as tables grow and queries filter by time.
What is partitioning?
Partitioning splits a large table into smaller physical segments, most commonly by time.
Example: partitioning a 1M-row table
Assume an events table with 1,000,000 rows spanning one year.
Without partitioning:
events (1,000,000 rows)
└── All data stored togetherA query like:
WHERE created_at >= '2024-12-01'Must scan a large portion of the table.
With monthly time-based partitioning:
events
├── events_2024_01 (~83k rows)
├── events_2024_02 (~83k rows)
├── ...
├── events_2024_12 (~83k rows)The same query scans only:
events_2024_12-
All other partitions are skipped.
-
This behaviour is known as partition pruning.
Why this matters
- Far fewer rows are scanned
- Less data is read from disk
- Performance remains predictable as data grows
Partitioning is especially effective when queries consistently filter by time.
Best practices
- Partition by time
- Use daily or monthly partitions
- Automate partition creation and cleanup
Review Query Execution Plans
Indexes and partitioning only help if the database actually uses them.
Query execution plans show how a query is executed and are the most reliable way to understand where time and resources are spent.
Most databases support:
EXPLAIN <query>Use execution plans to:
- Confirm indexes are being used
- Verify partition pruning
- Identify full table scans
- Validate performance improvements
Red flags
Seq Scan on events
Filter: (created_at >= '2024-01-01' AND status = 'active')A sequential scan means:
- The entire table is read
- Filters are applied after scanning all rows
- Query cost grows linearly as the table grows
On large tables, this usually indicates:
- Missing or unusable indexes
- Functions preventing index usage
- Partitioning not being applied
Sequential scans are expected on small tables. They are problematic on large, frequently queried tables.
Good signs
Index Scan using idx_events_created_at on events
Index Cond: (created_at >= '2024-01-01')This indicates:
- Indexes are being used
- Filters are applied early
- Fewer rows are scanned
- Performance scales predictably
Execution plans should be reviewed before and after changes to confirm impact.
Next Steps
Once you have these optimisations in place to improve query performance, the next step is to use pre-aggregations to further speed up your queries.