.. _segments: .. role:: rison(code) :language: o-rison Segments ======== When defining metrics it's important to keep them simple and ideal -- as close to their essence as possible. Filters and dimensions should always be applied at query time, instead of being part of the metric definition. This prevents metric proliferation and staleness. **Segments** are saved filters that can be used at query time to improve this workflow. ---- .. container:: db-selector Database selector loading... Imagine we have this table: .. code-block:: sql :class: db-example db-sqlite CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, app_id INTEGER, amount INTEGER, payment_type STRING ) .. code-block:: sql :class: db-example db-duckdb CREATE TABLE fact_orders ( id INTEGER, app_id INTEGER, amount INTEGER, payment_type TEXT ) .. code-block:: sql :class: db-example db-postgres CREATE TABLE fact_orders ( id SERIAL PRIMARY KEY, app_id INTEGER, amount INTEGER, payment_type TEXT ) And we create a couple metrics: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW total_orders AS SELECT SUM(amount) AS "total_orders [type=INTEGER]" FROM fact_orders; CREATE VIEW total_cc_orders AS SELECT SUM(amount) AS "total_cc_orders [type=INTEGER]" FROM fact_orders WHERE payment_type = 'card'; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders; CREATE VIEW total_cc_orders AS SELECT SUM(amount) AS total_cc_orders FROM fact_orders WHERE payment_type = 'card'; Now imagine a data engineer tells us that ``app_id=1234`` is actually a test application, and it shouldn't be included in the metrics. We now have to update all the metrics, and risk leaving some of them producing incorrect results. A better solution is to add what we call a **transform**, which is **a view that doesn't change cardinality**, and point our metrics to it: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW fact_orders_annotated AS SELECT * FROM fact_orders WHERE app_id <> 1234; CREATE VIEW total_orders AS SELECT SUM(amount) AS "total_orders [type=INTEGER]" FROM fact_orders_annotated; CREATE VIEW total_cc_orders AS SELECT SUM(amount) AS "total_cc_orders [type=INTEGER]" FROM fact_orders_annotated WHERE payment_type = 'card'; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW fact_orders_annotated AS SELECT * FROM fact_orders WHERE app_id <> 1234; CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders_annotated; CREATE VIEW total_cc_orders AS SELECT SUM(amount) AS total_cc_orders FROM fact_orders_annotated WHERE payment_type = 'card'; This still has the issue that we need to be sure to update all related metrics to read from ``fact_orders_annotated`` instead of ``fact_orders``. An even better approach is to define only the pure version of the metric: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW fact_orders_annotated AS SELECT * FROM fact_orders WHERE app_id <> 1234; CREATE VIEW total_orders AS SELECT SUM(amount) AS "total_orders [type=INTEGER]" FROM fact_orders_annotated; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW fact_orders_annotated AS SELECT * FROM fact_orders WHERE app_id <> 1234; CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders_annotated; Instead of having to define a ``total_cc_orders`` metric, we simply pass the filters we want at query time. For credit card only orders we can pass :rison:`filters:!('fact_orders_annotated.payment_type=card')`: .. code-block:: bash % curl "http://localhost:1974/db/run?q=metrics:!(total_orders),filters:!('fact_orders_annotated.payment_type=card')" | jq .. code-block:: json { "database": "db", "metrics": [ { "name": "total_orders", "type": "INTEGER", "description": null, "joins": null } ], "dimensions": [], "rows": [ { "total_orders": 10 } ] } One problem with this approach is that it only works for simple filters -- we lost the expressiveness of SQL! To solve this, Cantrip implements a flow for reusing saved filters, which are called **segments**. Segments are simply transforms that filter fact tables, and they can be applied at query time to any metric that reads from the same fact tables. Here's an example: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW credit_card_only AS -- cantrip: description=Orders paid with credit card SELECT * FROM fact_orders WHERE payment_type = 'card'; .. code-block:: sql :class: db-example db-duckdb db-postgres CREATE VIEW credit_card_only AS SELECT * FROM fact_orders WHERE payment_type = 'card'; COMMENT ON VIEW credit_card_only IS 'description=Orders paid with credit card'; .. container:: only-light .. graphviz:: _graphs/segments/database.light.dot .. container:: only-dark .. graphviz:: _graphs/segments/database.dark.dot We can use the ``/segments`` endpoint to find which segments are compatible with given metrics: .. code-block:: bash % curl 'http://localhost:1974/db/segments?q=metrics:!(total_orders)' | jq .. code-block:: json { "database": "db", "segments": [ { "name": "credit_card_only", "description": "Orders paid with credit card" }, { "name": "fact_orders_annotated", "description": null } ] } Note that ``fact_orders_annotated`` is returned as a valid segment, since technically it's a transform reading from a fact table connected to the metric. In the future we might change the logic so that trivial segments like this are ignored. We can then use the segment when generating SQL: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),segments:!(credit_card_only)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders" FROM "fact_orders_annotated" AS "fact_orders_annotated" WHERE fact_orders_annotated.payment_type = 'card' ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb db-postgres SELECT "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders_annotated"."amount") AS "total_orders" FROM "fact_orders_annotated" AS "fact_orders_annotated" WHERE "fact_orders_annotated"."payment_type" = 'card' ) AS __final .. container:: only-light .. graphviz:: _graphs/segments/query.light.dot .. container:: only-dark .. graphviz:: _graphs/segments/query.dark.dot It's important to keep segments simple! Segments are applied to the query by copying their predicates, so they should read from a single fact table, without any column transformations other than filtering. In the future we might support more complex use cases but, for now, keep segments as simple views that filter fact tables. Next let's look at :ref:`populations `, since they are very similar to segments.