.. _enrichment:
Enrichment
==========
In Cantrip, **enrichment dimensions** are virtual or external dimensions that can be joined with metrics based on implicit conditions like time and/or space. They allow you to do operations like computing a metric on a monthly time grain; rolling up a metric per ZIP code when all you have is latitude/longitude; or even computing metrics only when it was raining. It's a very powerful feature, and it opens up the exploratory analyses you can do on your data.
----
.. container:: db-selector
Database selector loading...
Virtual
~~~~~~~
Let's use a fact table similar to the one in the previous example, with added time and space columns:
.. code-block:: sql
:class: db-example db-sqlite
PRAGMA foreign_keys = ON;
SELECT InitSpatialMetaData(1);
CREATE TABLE fact_orders (
id INTEGER PRIMARY KEY,
amount INTEGER,
event_time DATETIME,
latitude REAL,
longitude REAL,
city TEXT
);
SELECT AddGeometryColumn('fact_orders', 'geom', 4326, 'POINT', 'XY');
SELECT CreateSpatialIndex('fact_orders', 'geom');
UPDATE fact_orders SET geom = MakePoint(longitude, latitude, 4326);
.. code-block:: sql
:class: db-example db-duckdb
INSTALL spatial;
LOAD spatial;
CREATE TABLE fact_orders (
id INTEGER PRIMARY KEY,
amount INTEGER,
event_time TIMESTAMP,
latitude REAL,
longitude REAL,
city TEXT,
geom GEOMETRY
);
UPDATE fact_orders
SET geom = ST_Point(longitude, latitude);
.. code-block:: sql
:class: db-example db-postgres
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE fact_orders (
id INTEGER PRIMARY KEY,
amount INTEGER,
event_time TIMESTAMP,
latitude REAL,
longitude REAL,
city TEXT,
geom geometry(Point, 4326)
);
UPDATE fact_orders
SET geom = ST_SetSRID(ST_Point(longitude, latitude), 4326);
Together with this metric:
.. code-block:: sql
:class: db-example db-sqlite
CREATE VIEW total_orders (total_orders) AS
SELECT SUM(amount) AS "total_orders [type=INTEGER]"
FROM fact_orders;
.. code-block:: sql
:class: db-example db-duckdb db-postgres
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders;
COMMENT ON COLUMN total_orders.total_orders IS 'type=INTEGER';
Let's see which dimensions are available to ``total_orders``:
.. code-block:: bash
% curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' | jq
.. code-block:: json
{
"database": "db",
"dimensions": [
{
"name": "Time.day",
"type": "DATE",
"grain": "day",
"incremental": false,
"description": null
},
{
"name": "Time.hour",
"type": "DATETIME",
"grain": "hour",
"incremental": false,
"description": null
},
{
"name": "Time.minute",
"type": "DATETIME",
"grain": "minute",
"incremental": false,
"description": null
},
{
"name": "Time.month",
"type": "DATE",
"grain": "month",
"incremental": false,
"description": null
},
{
"name": "Time.second",
"type": "DATETIME",
"grain": "second",
"incremental": false,
"description": null
},
{
"name": "Time.year",
"type": "DATE",
"grain": "year",
"incremental": false,
"description": null
},
{
"name": "fact_orders.amount",
"type": "INTEGER",
"grain": null,
"incremental": false,
"description": null
},
{
"name": "fact_orders.city",
"type": "STRING",
"grain": null,
"incremental": false,
"description": null
},
{
"name": "fact_orders.event_time",
"type": "DATETIME",
"grain": null,
"incremental": false,
"description": null
},
{
"name": "fact_orders.geom",
"type": "POINT",
"grain": null,
"incremental": false,
"description": null
},
{
"name": "fact_orders.latitude",
"type": "NUMBER",
"grain": null,
"incremental": false,
"description": null
},
{
"name": "fact_orders.longitude",
"type": "NUMBER",
"grain": null,
"incremental": false,
"description": null
}
]
}
As we can see, in addition to columns from the fact table, we also have multiple ``Time.*`` dimensions. These are **virtual dimensions**, since they're not stored in the database. Instead, they are computed on the fly when the SQL is generated by transforming the time column in the fact table:
.. code-block:: bash
% curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Time.day)' \
> | jq -r '.query.sql'
.. code-block:: sql
:class: db-example db-sqlite
SELECT
"__annotated"."Time.day" AS "Time.day [DATE]",
"__annotated"."total_orders" AS "total_orders [INTEGER]"
FROM (
SELECT
"__final"."Time.day" AS "Time.day",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM(amount) AS "total_orders",
STRFTIME('%Y-%m-%d', "fact_orders"."event_time") AS "Time.day"
FROM fact_orders
GROUP BY
STRFTIME('%Y-%m-%d', "fact_orders"."event_time")
) AS __final
) AS __annotated
.. code-block:: sql
:class: db-example db-duckdb db-postgres
SELECT
"__final"."Time.day" AS "Time.day",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders"."amount") AS "total_orders",
CAST(DATE_TRUNC('DAY', "fact_orders"."event_time") AS DATE) AS "Time.day"
FROM "fact_orders" AS "fact_orders"
GROUP BY
CAST(DATE_TRUNC('DAY', "fact_orders"."event_time") AS DATE)
) AS __final
Note that, if the fact table had multiple time columns (for example, ``event_time`` and ``log_time``), the metric definition would need an annotation for the preferred join (``join=event_time`` or ``join=log_time``). Also of note is the ``time_grain=`` annotation, which defines the lowest grain of ``Time.*`` available. For example, if we annotate the ``event_time`` with an hourly grain (``PT1H`` in ISO):
.. code-block:: sql
:class: db-example db-sqlite
CREATE VIEW fact_orders_annotated (
id,
amount,
event_time,
latitude,
longitude,
geom,
city
) AS
SELECT
id,
amount,
event_time AS "event_time [time_grain=hour]",
latitude,
longitude,
geom AS "geom [type=POINT]",
city
FROM fact_orders;
CREATE VIEW total_orders AS
SELECT SUM(amount) AS "total_orders"
FROM fact_orders_annotated;
.. code-block:: sql
:class: db-example db-duckdb db-postgres
CREATE VIEW fact_orders_annotated AS
SELECT
id,
amount,
event_time,
latitude,
longitude,
geom,
city
FROM fact_orders;
COMMENT ON COLUMN fact_orders_annotated.event_time IS 'time_grain=PT1H';
COMMENT ON COLUMN fact_orders_annotated.geom IS 'type=POINT';
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders_annotated;
Then it can't be queried with the ``Time.second`` or ``Time.minute`` dimensions:
.. code-block:: bash
% curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' \
> | jq -r '.dimensions[] | select(.name | startswith("Time.")) | .name'
Time.day
Time.hour
Time.month
Time.year
Calendar
~~~~~~~~
Another way of enriching metrics is via a calendar relation. Cantrip uses the naming convention of ``cantrip__dim_*`` for enrichment relations. A special one is ``cantrip__dim_calendar``:
.. code-block:: sql
:class: db-example db-sqlite db-duckdb db-postgres
CREATE TABLE cantrip__dim_calendar (
date DATE PRIMARY KEY,
us_holiday TEXT,
is_weekend BOOLEAN
);
INSERT INTO cantrip__dim_calendar (date, us_holiday, is_weekend) VALUES
('2026-01-01', 'New Year', FALSE),
('2026-01-02', NULL, FALSE),
('2026-01-03', NULL, TRUE);
Cantrip will create a dimension for each column other than ``date``; in this case, we would have the dimensions ``Calendar.us_holiday`` and ``Calendar.is_weekend`` available to any metrics that are connected to a time column.
.. code-block:: bash
% curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' \
> | jq -r '.dimensions[] | select(.name | startswith("Calendar."))'
.. code-block:: json
{
"name": "Calendar.is_weekend",
"type": "BOOLEAN",
"grain": null,
"incremental": false,
"description": null
}
{
"name": "Calendar.us_holiday",
"type": "STRING",
"grain": null,
"incremental": false,
"description": null
}
And here's the generated SQL for ``total_orders`` by ``Calendar.us_holiday``:
.. code-block:: bash
% curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Calendar.us_holiday)' \
> | jq -r '.query.sql'
.. code-block:: sql
:class: db-example db-sqlite
SELECT
"__annotated"."Calendar.us_holiday" AS "Calendar.us_holiday [TEXT]",
"__annotated"."total_orders" AS "total_orders [INTEGER]"
FROM (
SELECT
"__final"."Calendar.us_holiday" AS "Calendar.us_holiday",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
"cantrip__dim_calendar"."us_holiday" AS "Calendar.us_holiday"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_calendar"
ON STRFTIME('%Y-%m-%d', "fact_orders_annotated"."event_time") = "cantrip__dim_calendar"."date"
GROUP BY
"cantrip__dim_calendar"."us_holiday"
) AS __final
) AS __annotated
.. code-block:: sql
:class: db-example db-duckdb db-postgres
SELECT
"__final"."Calendar.us_holiday" AS "Calendar.us_holiday",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
"cantrip__dim_calendar"."us_holiday" AS "Calendar.us_holiday"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_calendar"
ON CAST(DATE_TRUNC('DAY', "fact_orders_annotated"."event_time") AS DATE) = "cantrip__dim_calendar"."date"
GROUP BY
"cantrip__dim_calendar"."us_holiday"
) AS __final
Geometry
~~~~~~~~
Similar to the calendar relation, Cantrip also supports a special ``cantrip__dim_geometries`` relation with geometries. Here's a simple example:
.. code-block:: sql
:class: db-example db-sqlite
CREATE TABLE cantrip__dim_geometries (
id INTEGER PRIMARY KEY,
grain TEXT NOT NULL,
value TEXT NOT NULL
);
SELECT AddGeometryColumn('cantrip__dim_geometries', 'geometry', 4326, 'MULTIPOLYGON', 'XY');
INSERT INTO cantrip__dim_geometries (id, grain, value, geometry) VALUES
(1, 'country', 'US', GeomFromText('MULTIPOLYGON(((...)))', 4326)),
(2, 'country', 'UK', GeomFromText('MULTIPOLYGON(((...)))', 4326)),
(3, 'city', 'New York', GeomFromText('MULTIPOLYGON(((...)))', 4326)),
(4, 'city', 'Los Angeles', GeomFromText('MULTIPOLYGON(((...)))', 4326)),
(5, 'city', 'London', GeomFromText('MULTIPOLYGON(((...)))', 4326));
.. code-block:: sql
:class: db-example db-duckdb db-postgres
CREATE TABLE cantrip__dim_geometries (
id INTEGER PRIMARY KEY,
grain TEXT NOT NULL,
value TEXT NOT NULL,
geometry GEOMETRY
);
INSERT INTO cantrip__dim_geometries (id, grain, value, geometry) VALUES
(1, 'country', 'US', ST_GeomFromText('MULTIPOLYGON(((...)))')),
(2, 'country', 'UK', ST_GeomFromText('MULTIPOLYGON(((...)))')),
(3, 'city', 'New York', ST_GeomFromText('MULTIPOLYGON(((...)))')),
(4, 'city', 'Los Angeles', ST_GeomFromText('MULTIPOLYGON(((...)))')),
(5, 'city', 'London', ST_GeomFromText('MULTIPOLYGON(((...)))'));
Differently from the ``cantrip__dim_calendar`` relation, the geometries relation always has the same schema: the ``grain`` column indicates the spatial grain, and the ``value`` is a geometry type. Each distinct value of ``grain`` becomes a dimension -- ``Geometry.country`` and ``Geometry.city``, for the example above. The generated SQL uses spatial functions to rollup the ``POINT``-level metric into the desired spatial grain. For example, here's the generated SQL for ``total_orders`` by ``Geometry.country``:
.. code-block:: bash
% curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Geometry.country)' \
> | jq -r '.query.sql'
.. code-block:: sql
:class: db-example db-sqlite
SELECT
"__annotated"."Geometry.country" AS "Geometry.country [GEOMETRY]",
"__annotated"."total_orders" AS "total_orders [INTEGER]"
FROM (
SELECT
"__final"."Geometry.country" AS "Geometry.country",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
JSON_OBJECT(
'type',
'Feature',
'properties',
JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"),
'geometry',
JSON(ASGEOJSON("cantrip__dim_geometries__country"."geometry"))
) AS "Geometry.country"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__country"
ON "cantrip__dim_geometries__country"."grain" = 'country'
AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__country"."geometry")
GROUP BY
JSON_OBJECT(
'type',
'Feature',
'properties',
JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"),
'geometry',
JSON(ASGEOJSON("cantrip__dim_geometries__country"."geometry"))
)
) AS __final
) AS __annotated
.. code-block:: sql
:class: db-example db-duckdb db-postgres
SELECT
"__final"."Geometry.country" AS "Geometry.country",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
JSON_OBJECT(
'type',
'Feature',
'properties',
JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"),
'geometry',
CAST(ST_AsGeoJSON("cantrip__dim_geometries__country"."geometry") AS JSON)
) AS "Geometry.country"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__country"
ON "cantrip__dim_geometries__country"."grain" = 'country'
AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__country"."geometry")
GROUP BY
JSON_OBJECT(
'type',
'Feature',
'properties',
JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"),
'geometry',
CAST(ST_AsGeoJSON("cantrip__dim_geometries__country"."geometry") AS JSON)
)
) AS __final
Note that the ``Geometry.country`` dimension is returned as `GeoJSON `_.
Bring your own
~~~~~~~~~~~~~~
While ``cantrip__dim_calendar`` and ``cantrip__dim_geometries`` are special relations, you can bring any relation (table or view) as dimensions by following the same **convention**. The relation name should start with ``cantrip__dim_``, and its columns determine the possible joins:
- Having a ``date`` column allows for time joins.
- Having **both** ``grain`` and ``value`` allows for spatial joins.
Relations can be time-only, space-only, or have both. For example, if you want to slice and dice metrics by weather data, you can create a relation like this:
.. code-block:: sql
:class: db-example db-sqlite db-duckdb db-postgres
CREATE TABLE cantrip__dim_weather (
date DATE,
grain TEXT,
value TEXT,
temp_high REAL,
is_sunny BOOLEAN,
PRIMARY KEY (date, grain, value)
);
INSERT INTO cantrip__dim_weather VALUES
('2026-01-01', 'city', 'New York', 79.0, 'TRUE'),
('2026-01-01', 'city', 'Los Angeles', 69.0, 'FALSE'),
('2026-01-01', 'city', 'London', 61.0, 'FALSE'),
('2026-01-05', 'city', 'New York', 74.0, 'FALSE'),
('2026-01-05', 'city', 'Los Angeles', 67.0, 'FALSE'),
('2026-01-05', 'city', 'London', 63.0, 'TRUE'),
('2026-02-01', 'city', 'New York', 77.0, 'TRUE'),
('2026-02-01', 'city', 'Los Angeles', 72.0, 'FALSE'),
('2026-02-01', 'city', 'London', 65.0, 'FALSE');
The table above adds 2 new dimensions that can be used with ``total_orders``:
.. code-block:: bash
% curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' \
> | jq -r '.dimensions[] | select(.name | startswith("Weather."))'
.. code-block:: json
{
"name": "Weather.is_sunny.city",
"type": "BOOLEAN",
"grain": null,
"incremental": false,
"description": null
}
{
"name": "Weather.temp_high.city",
"type": "NUMBER",
"grain": null,
"incremental": false,
"description": null
}
Spatial joins go through the ``cantrip__dim_geometries``, mapping from ``POINT`` to ``GEOMETRY`` to the ``grain`` column:
.. code-block:: bash
% curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Weather.is_sunny.city)' \
> | jq -r '.query.sql'
.. code-block:: sql
:class: db-example db-sqlite
SELECT
"__annotated"."Weather.is_sunny.city" AS "Weather.is_sunny.city [BOOLEAN]",
"__annotated"."total_orders" AS "total_orders [INTEGER]"
FROM (
SELECT
"__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
cantrip__dim_weather__city.is_sunny AS "Weather.is_sunny.city"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__city"
ON "cantrip__dim_geometries__city"."grain" = 'city'
AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__city"."geometry")
JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city"
ON STRFTIME('%Y-%m-%d', "fact_orders_annotated"."event_time") = "cantrip__dim_weather__city"."date"
AND "cantrip__dim_weather__city"."grain" = 'city'
AND "cantrip__dim_weather__city"."value" = "cantrip__dim_geometries__city"."value"
GROUP BY
cantrip__dim_weather__city.is_sunny
) AS __final
) AS __annotated
.. code-block:: sql
:class: db-example db-duckdb db-postgres
SELECT
"__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
"cantrip__dim_weather__city"."is_sunny" AS "Weather.is_sunny.city"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__city"
ON "cantrip__dim_geometries__city"."grain" = 'city'
AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__city"."geometry")
JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city"
ON CAST(DATE_TRUNC('DAY', "fact_orders_annotated"."event_time") AS DATE) = "cantrip__dim_weather__city"."date"
AND "cantrip__dim_weather__city"."grain" = 'city'
AND "cantrip__dim_weather__city"."value" = "cantrip__dim_geometries__city"."value"
GROUP BY
"cantrip__dim_weather__city"."is_sunny"
) AS __final
.. container:: only-light
.. graphviz:: _graphs/enrichment/weather_city_query.light.dot
.. container:: only-dark
.. graphviz:: _graphs/enrichment/weather_city_query.dark.dot
Note that, if the fact table already has a column that matches the grain, we can instead annotate it with ``geo_grain=``, and perform a direct join instead of having to do a spatial query. In this example, since ``fact_orders`` has a column ``city`` in addition to ``latitude``, ``longitude``, and ``geom``, we can annotate it like this:
.. code-block:: sql
:class: db-example db-sqlite
CREATE VIEW fact_orders_annotated (
id,
amount,
event_time,
latitude,
longitude,
geom,
city
) AS
SELECT
id,
amount,
event_time AS "event_time [time_grain=hour]",
latitude,
longitude,
geom AS "geom [type=POINT]",
city AS "city [geo_grain=city]"
FROM fact_orders;
.. code-block:: sql
:class: db-example db-duckdb db-postgres
CREATE VIEW fact_orders_annotated AS
SELECT
id,
amount,
event_time,
latitude,
longitude,
geom,
city
FROM fact_orders;
COMMENT ON COLUMN fact_orders_annotated.event_time IS 'time_grain=PT1H';
COMMENT ON COLUMN fact_orders_annotated.geom IS 'type=POINT';
COMMENT ON COLUMN fact_orders_annotated.city IS 'geo_grain=city';
Then, because ``cantrip__dim_weather`` has rows with ``grain='city'``, the query will be optimized to:
.. code-block:: bash
% curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Weather.is_sunny.city)' \
> | jq -r '.query.sql'
.. code-block:: sql
:class: db-example db-sqlite
SELECT
"__annotated"."Weather.is_sunny.city" AS "Weather.is_sunny.city [BOOLEAN]",
"__annotated"."total_orders" AS "total_orders [INTEGER]"
FROM (
SELECT
"__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
cantrip__dim_weather__city.is_sunny AS "Weather.is_sunny.city"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city"
ON STRFTIME('%Y-%m-%d', "fact_orders_annotated"."event_time") = "cantrip__dim_weather__city"."date"
AND "cantrip__dim_weather__city"."grain" = 'city'
AND "cantrip__dim_weather__city"."value" = "fact_orders_annotated"."city"
GROUP BY
cantrip__dim_weather__city.is_sunny
) AS __final
) AS __annotated
.. code-block:: sql
:class: db-example db-duckdb db-postgres
SELECT
"__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders_annotated"."amount") AS "total_orders",
"cantrip__dim_weather__city"."is_sunny" AS "Weather.is_sunny.city"
FROM "fact_orders_annotated" AS "fact_orders_annotated"
JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city"
ON CAST(DATE_TRUNC('DAY', "fact_orders_annotated"."event_time") AS DATE) = "cantrip__dim_weather__city"."date"
AND "cantrip__dim_weather__city"."grain" = 'city'
AND "cantrip__dim_weather__city"."value" = "fact_orders_annotated"."city"
GROUP BY
"cantrip__dim_weather__city"."is_sunny"
) AS __final
Note that the query no longer needs the join with ``cantrip__dim_geometries``, and neither the use of spatial functions, improving its performance.
.. container:: only-light
.. graphviz:: _graphs/enrichment/weather_city_geo_grain_query.light.dot
.. container:: only-dark
.. graphviz:: _graphs/enrichment/weather_city_geo_grain_query.dark.dot
Next let's take a look at :ref:`derived metrics `.