.. _enrichment: Enrichment ========== In Cantrip, **enrichment dimensions** are virtual or external dimensions that can be joined with metrics based on implicit conditions like time and/or space. They allow you to do operations like computing a metric on a monthly time grain; rolling up a metric per ZIP code when all you have is latitude/longitude; or even computing metrics only when it was raining. It's a very powerful feature, and it opens up the exploratory analyses you can do on your data. ---- .. container:: db-selector Database selector loading... Virtual ~~~~~~~ Let's use a fact table similar to the one in the previous example, with added time and space columns: .. code-block:: sql :class: db-example db-sqlite PRAGMA foreign_keys = ON; SELECT InitSpatialMetaData(1); CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, amount INTEGER, event_time DATETIME, latitude REAL, longitude REAL, city TEXT ); SELECT AddGeometryColumn('fact_orders', 'geom', 4326, 'POINT', 'XY'); SELECT CreateSpatialIndex('fact_orders', 'geom'); UPDATE fact_orders SET geom = MakePoint(longitude, latitude, 4326); .. code-block:: sql :class: db-example db-duckdb INSTALL spatial; LOAD spatial; CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, amount INTEGER, event_time TIMESTAMP, latitude REAL, longitude REAL, city TEXT, geom GEOMETRY ); UPDATE fact_orders SET geom = ST_Point(longitude, latitude); .. code-block:: sql :class: db-example db-postgres CREATE EXTENSION IF NOT EXISTS postgis; CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, amount INTEGER, event_time TIMESTAMP, latitude REAL, longitude REAL, city TEXT, geom geometry(Point, 4326) ); UPDATE fact_orders SET geom = ST_SetSRID(ST_Point(longitude, latitude), 4326); Together with this metric: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW total_orders (total_orders) AS SELECT SUM(amount) AS "total_orders [type=INTEGER]" FROM fact_orders; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders; COMMENT ON COLUMN total_orders.total_orders IS 'type=INTEGER'; Let's see which dimensions are available to ``total_orders``: .. code-block:: bash % curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' | jq .. code-block:: json { "database": "db", "dimensions": [ { "name": "Time.day", "type": "DATE", "grain": "day", "incremental": false, "description": null }, { "name": "Time.hour", "type": "DATETIME", "grain": "hour", "incremental": false, "description": null }, { "name": "Time.minute", "type": "DATETIME", "grain": "minute", "incremental": false, "description": null }, { "name": "Time.month", "type": "DATE", "grain": "month", "incremental": false, "description": null }, { "name": "Time.second", "type": "DATETIME", "grain": "second", "incremental": false, "description": null }, { "name": "Time.year", "type": "DATE", "grain": "year", "incremental": false, "description": null }, { "name": "fact_orders.amount", "type": "INTEGER", "grain": null, "incremental": false, "description": null }, { "name": "fact_orders.city", "type": "STRING", "grain": null, "incremental": false, "description": null }, { "name": "fact_orders.event_time", "type": "DATETIME", "grain": null, "incremental": false, "description": null }, { "name": "fact_orders.geom", "type": "POINT", "grain": null, "incremental": false, "description": null }, { "name": "fact_orders.latitude", "type": "NUMBER", "grain": null, "incremental": false, "description": null }, { "name": "fact_orders.longitude", "type": "NUMBER", "grain": null, "incremental": false, "description": null } ] } As we can see, in addition to columns from the fact table, we also have multiple ``Time.*`` dimensions. These are **virtual dimensions**, since they're not stored in the database. Instead, they are computed on the fly when the SQL is generated by transforming the time column in the fact table: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Time.day)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."Time.day" AS "Time.day [DATE]", "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."Time.day" AS "Time.day", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM(amount) AS "total_orders", STRFTIME('%Y-%m-%d', "fact_orders"."event_time") AS "Time.day" FROM fact_orders GROUP BY STRFTIME('%Y-%m-%d', "fact_orders"."event_time") ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb db-postgres SELECT "__final"."Time.day" AS "Time.day", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders"."amount") AS "total_orders", CAST(DATE_TRUNC('DAY', "fact_orders"."event_time") AS DATE) AS "Time.day" FROM "fact_orders" AS "fact_orders" GROUP BY CAST(DATE_TRUNC('DAY', "fact_orders"."event_time") AS DATE) ) AS __final Note that, if the fact table had multiple time columns (for example, ``event_time`` and ``log_time``), the metric definition would need an annotation for the preferred join (``join=event_time`` or ``join=log_time``). Also of note is the ``time_grain=`` annotation, which defines the lowest grain of ``Time.*`` available. For example, if we annotate the ``event_time`` with an hourly grain (``PT1H`` in ISO): .. code-block:: sql :class: db-example db-sqlite CREATE VIEW fact_orders_annotated ( id, amount, event_time, latitude, longitude, geom, city ) AS SELECT id, amount, event_time AS "event_time [time_grain=hour]", latitude, longitude, geom AS "geom [type=POINT]", city FROM fact_orders; CREATE VIEW total_orders AS SELECT SUM(amount) AS "total_orders" FROM fact_orders_annotated; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW fact_orders_annotated AS SELECT id, amount, event_time, latitude, longitude, geom, city FROM fact_orders; COMMENT ON COLUMN fact_orders_annotated.event_time IS 'time_grain=PT1H'; COMMENT ON COLUMN fact_orders_annotated.geom IS 'type=POINT'; CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders_annotated; Then it can't be queried with the ``Time.second`` or ``Time.minute`` dimensions: .. code-block:: bash % curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' \ > | jq -r '.dimensions[] | select(.name | startswith("Time.")) | .name' Time.day Time.hour Time.month Time.year Calendar ~~~~~~~~ Another way of enriching metrics is via a calendar relation. Cantrip uses the naming convention of ``cantrip__dim_*`` for enrichment relations. A special one is ``cantrip__dim_calendar``: .. code-block:: sql :class: db-example db-sqlite db-duckdb db-postgres CREATE TABLE cantrip__dim_calendar ( date DATE PRIMARY KEY, us_holiday TEXT, is_weekend BOOLEAN ); INSERT INTO cantrip__dim_calendar (date, us_holiday, is_weekend) VALUES ('2026-01-01', 'New Year', FALSE), ('2026-01-02', NULL, FALSE), ('2026-01-03', NULL, TRUE); Cantrip will create a dimension for each column other than ``date``; in this case, we would have the dimensions ``Calendar.us_holiday`` and ``Calendar.is_weekend`` available to any metrics that are connected to a time column. .. code-block:: bash % curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' \ > | jq -r '.dimensions[] | select(.name | startswith("Calendar."))' .. code-block:: json { "name": "Calendar.is_weekend", "type": "BOOLEAN", "grain": null, "incremental": false, "description": null } { "name": "Calendar.us_holiday", "type": "STRING", "grain": null, "incremental": false, "description": null } And here's the generated SQL for ``total_orders`` by ``Calendar.us_holiday``: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Calendar.us_holiday)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."Calendar.us_holiday" AS "Calendar.us_holiday [TEXT]", "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."Calendar.us_holiday" AS "Calendar.us_holiday", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", "cantrip__dim_calendar"."us_holiday" AS "Calendar.us_holiday" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_calendar" ON STRFTIME('%Y-%m-%d', "fact_orders_annotated"."event_time") = "cantrip__dim_calendar"."date" GROUP BY "cantrip__dim_calendar"."us_holiday" ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb db-postgres SELECT "__final"."Calendar.us_holiday" AS "Calendar.us_holiday", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", "cantrip__dim_calendar"."us_holiday" AS "Calendar.us_holiday" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_calendar" ON CAST(DATE_TRUNC('DAY', "fact_orders_annotated"."event_time") AS DATE) = "cantrip__dim_calendar"."date" GROUP BY "cantrip__dim_calendar"."us_holiday" ) AS __final Geometry ~~~~~~~~ Similar to the calendar relation, Cantrip also supports a special ``cantrip__dim_geometries`` relation with geometries. Here's a simple example: .. code-block:: sql :class: db-example db-sqlite CREATE TABLE cantrip__dim_geometries ( id INTEGER PRIMARY KEY, grain TEXT NOT NULL, value TEXT NOT NULL ); SELECT AddGeometryColumn('cantrip__dim_geometries', 'geometry', 4326, 'MULTIPOLYGON', 'XY'); INSERT INTO cantrip__dim_geometries (id, grain, value, geometry) VALUES (1, 'country', 'US', GeomFromText('MULTIPOLYGON(((...)))', 4326)), (2, 'country', 'UK', GeomFromText('MULTIPOLYGON(((...)))', 4326)), (3, 'city', 'New York', GeomFromText('MULTIPOLYGON(((...)))', 4326)), (4, 'city', 'Los Angeles', GeomFromText('MULTIPOLYGON(((...)))', 4326)), (5, 'city', 'London', GeomFromText('MULTIPOLYGON(((...)))', 4326)); .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE TABLE cantrip__dim_geometries ( id INTEGER PRIMARY KEY, grain TEXT NOT NULL, value TEXT NOT NULL, geometry GEOMETRY ); INSERT INTO cantrip__dim_geometries (id, grain, value, geometry) VALUES (1, 'country', 'US', ST_GeomFromText('MULTIPOLYGON(((...)))')), (2, 'country', 'UK', ST_GeomFromText('MULTIPOLYGON(((...)))')), (3, 'city', 'New York', ST_GeomFromText('MULTIPOLYGON(((...)))')), (4, 'city', 'Los Angeles', ST_GeomFromText('MULTIPOLYGON(((...)))')), (5, 'city', 'London', ST_GeomFromText('MULTIPOLYGON(((...)))')); Differently from the ``cantrip__dim_calendar`` relation, the geometries relation always has the same schema: the ``grain`` column indicates the spatial grain, and the ``value`` is a geometry type. Each distinct value of ``grain`` becomes a dimension -- ``Geometry.country`` and ``Geometry.city``, for the example above. The generated SQL uses spatial functions to rollup the ``POINT``-level metric into the desired spatial grain. For example, here's the generated SQL for ``total_orders`` by ``Geometry.country``: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Geometry.country)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."Geometry.country" AS "Geometry.country [GEOMETRY]", "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."Geometry.country" AS "Geometry.country", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", JSON_OBJECT( 'type', 'Feature', 'properties', JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"), 'geometry', JSON(ASGEOJSON("cantrip__dim_geometries__country"."geometry")) ) AS "Geometry.country" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__country" ON "cantrip__dim_geometries__country"."grain" = 'country' AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__country"."geometry") GROUP BY JSON_OBJECT( 'type', 'Feature', 'properties', JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"), 'geometry', JSON(ASGEOJSON("cantrip__dim_geometries__country"."geometry")) ) ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb db-postgres SELECT "__final"."Geometry.country" AS "Geometry.country", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", JSON_OBJECT( 'type', 'Feature', 'properties', JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"), 'geometry', CAST(ST_AsGeoJSON("cantrip__dim_geometries__country"."geometry") AS JSON) ) AS "Geometry.country" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__country" ON "cantrip__dim_geometries__country"."grain" = 'country' AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__country"."geometry") GROUP BY JSON_OBJECT( 'type', 'Feature', 'properties', JSON_OBJECT('name', "cantrip__dim_geometries__country"."value"), 'geometry', CAST(ST_AsGeoJSON("cantrip__dim_geometries__country"."geometry") AS JSON) ) ) AS __final Note that the ``Geometry.country`` dimension is returned as `GeoJSON `_. Bring your own ~~~~~~~~~~~~~~ While ``cantrip__dim_calendar`` and ``cantrip__dim_geometries`` are special relations, you can bring any relation (table or view) as dimensions by following the same **convention**. The relation name should start with ``cantrip__dim_``, and its columns determine the possible joins: - Having a ``date`` column allows for time joins. - Having **both** ``grain`` and ``value`` allows for spatial joins. Relations can be time-only, space-only, or have both. For example, if you want to slice and dice metrics by weather data, you can create a relation like this: .. code-block:: sql :class: db-example db-sqlite db-duckdb db-postgres CREATE TABLE cantrip__dim_weather ( date DATE, grain TEXT, value TEXT, temp_high REAL, is_sunny BOOLEAN, PRIMARY KEY (date, grain, value) ); INSERT INTO cantrip__dim_weather VALUES ('2026-01-01', 'city', 'New York', 79.0, 'TRUE'), ('2026-01-01', 'city', 'Los Angeles', 69.0, 'FALSE'), ('2026-01-01', 'city', 'London', 61.0, 'FALSE'), ('2026-01-05', 'city', 'New York', 74.0, 'FALSE'), ('2026-01-05', 'city', 'Los Angeles', 67.0, 'FALSE'), ('2026-01-05', 'city', 'London', 63.0, 'TRUE'), ('2026-02-01', 'city', 'New York', 77.0, 'TRUE'), ('2026-02-01', 'city', 'Los Angeles', 72.0, 'FALSE'), ('2026-02-01', 'city', 'London', 65.0, 'FALSE'); The table above adds 2 new dimensions that can be used with ``total_orders``: .. code-block:: bash % curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' \ > | jq -r '.dimensions[] | select(.name | startswith("Weather."))' .. code-block:: json { "name": "Weather.is_sunny.city", "type": "BOOLEAN", "grain": null, "incremental": false, "description": null } { "name": "Weather.temp_high.city", "type": "NUMBER", "grain": null, "incremental": false, "description": null } Spatial joins go through the ``cantrip__dim_geometries``, mapping from ``POINT`` to ``GEOMETRY`` to the ``grain`` column: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Weather.is_sunny.city)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."Weather.is_sunny.city" AS "Weather.is_sunny.city [BOOLEAN]", "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", cantrip__dim_weather__city.is_sunny AS "Weather.is_sunny.city" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__city" ON "cantrip__dim_geometries__city"."grain" = 'city' AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__city"."geometry") JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city" ON STRFTIME('%Y-%m-%d', "fact_orders_annotated"."event_time") = "cantrip__dim_weather__city"."date" AND "cantrip__dim_weather__city"."grain" = 'city' AND "cantrip__dim_weather__city"."value" = "cantrip__dim_geometries__city"."value" GROUP BY cantrip__dim_weather__city.is_sunny ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb db-postgres SELECT "__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", "cantrip__dim_weather__city"."is_sunny" AS "Weather.is_sunny.city" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_geometries" AS "cantrip__dim_geometries__city" ON "cantrip__dim_geometries__city"."grain" = 'city' AND ST_WITHIN("fact_orders_annotated"."geom", "cantrip__dim_geometries__city"."geometry") JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city" ON CAST(DATE_TRUNC('DAY', "fact_orders_annotated"."event_time") AS DATE) = "cantrip__dim_weather__city"."date" AND "cantrip__dim_weather__city"."grain" = 'city' AND "cantrip__dim_weather__city"."value" = "cantrip__dim_geometries__city"."value" GROUP BY "cantrip__dim_weather__city"."is_sunny" ) AS __final .. container:: only-light .. graphviz:: _graphs/enrichment/weather_city_query.light.dot .. container:: only-dark .. graphviz:: _graphs/enrichment/weather_city_query.dark.dot Note that, if the fact table already has a column that matches the grain, we can instead annotate it with ``geo_grain=``, and perform a direct join instead of having to do a spatial query. In this example, since ``fact_orders`` has a column ``city`` in addition to ``latitude``, ``longitude``, and ``geom``, we can annotate it like this: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW fact_orders_annotated ( id, amount, event_time, latitude, longitude, geom, city ) AS SELECT id, amount, event_time AS "event_time [time_grain=hour]", latitude, longitude, geom AS "geom [type=POINT]", city AS "city [geo_grain=city]" FROM fact_orders; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW fact_orders_annotated AS SELECT id, amount, event_time, latitude, longitude, geom, city FROM fact_orders; COMMENT ON COLUMN fact_orders_annotated.event_time IS 'time_grain=PT1H'; COMMENT ON COLUMN fact_orders_annotated.geom IS 'type=POINT'; COMMENT ON COLUMN fact_orders_annotated.city IS 'geo_grain=city'; Then, because ``cantrip__dim_weather`` has rows with ``grain='city'``, the query will be optimized to: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(Weather.is_sunny.city)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."Weather.is_sunny.city" AS "Weather.is_sunny.city [BOOLEAN]", "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", cantrip__dim_weather__city.is_sunny AS "Weather.is_sunny.city" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city" ON STRFTIME('%Y-%m-%d', "fact_orders_annotated"."event_time") = "cantrip__dim_weather__city"."date" AND "cantrip__dim_weather__city"."grain" = 'city' AND "cantrip__dim_weather__city"."value" = "fact_orders_annotated"."city" GROUP BY cantrip__dim_weather__city.is_sunny ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb db-postgres SELECT "__final"."Weather.is_sunny.city" AS "Weather.is_sunny.city", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders", "cantrip__dim_weather__city"."is_sunny" AS "Weather.is_sunny.city" FROM "fact_orders_annotated" AS "fact_orders_annotated" JOIN "cantrip__dim_weather" AS "cantrip__dim_weather__city" ON CAST(DATE_TRUNC('DAY', "fact_orders_annotated"."event_time") AS DATE) = "cantrip__dim_weather__city"."date" AND "cantrip__dim_weather__city"."grain" = 'city' AND "cantrip__dim_weather__city"."value" = "fact_orders_annotated"."city" GROUP BY "cantrip__dim_weather__city"."is_sunny" ) AS __final Note that the query no longer needs the join with ``cantrip__dim_geometries``, and neither the use of spatial functions, improving its performance. .. container:: only-light .. graphviz:: _graphs/enrichment/weather_city_geo_grain_query.light.dot .. container:: only-dark .. graphviz:: _graphs/enrichment/weather_city_geo_grain_query.dark.dot Next let's take a look at :ref:`derived metrics `.