.. meta:: :description: 💫 Cantrip semantic layer :keywords: semantic layer, metric, dimension 💫 Cantrip ========== Cantrip is a lightweight, hand-crafted, semantic layer as a library, written in modern Python. Semantics are stored as database views, so you can **use native SQL to define metrics, dimensions, relationships** and more, without the need for any external services. Cantrip generates the SQL queries needed to compute arbitrarily complex requests for multiple metrics and dimensions, and can even run the query for you. ---- .. container:: db-selector Database selector loading... Imagine we have these 2 tables in our database: .. code-block:: sql :class: db-example db-sqlite PRAGMA foreign_keys = ON; CREATE TABLE dim_users ( id INTEGER PRIMARY KEY, name TEXT ); CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, user_id INTEGER, amount INTEGER, status STRING, FOREIGN KEY (user_id) REFERENCES dim_users(id) ); .. code-block:: sql :class: db-example db-duckdb CREATE TABLE dim_users ( id INTEGER PRIMARY KEY, name VARCHAR NOT NULL ); CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, user_id INTEGER, amount INTEGER, status VARCHAR, FOREIGN KEY (user_id) REFERENCES dim_users(id) ); .. code-block:: sql :class: db-example db-postgres CREATE TABLE dim_users ( id SERIAL PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE fact_orders ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES dim_users(id), amount INTEGER, status TEXT ); Let's create a simple metric. We do that by defining a view. The metric should be an aggregation without any grain: .. code-block:: sql :class: db-example db-sqlite db-duckdb db-postgres CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders WHERE status <> 'canceled'; We can now query the metric: .. code-block:: sql :class: db-example db-sqlite db-duckdb db-postgres SELECT * FROM total_orders; 450.0 Not super useful. But let's connect Cantrip to the database. In the following examples we'll be using a simple service that comes with Cantrip, but the service is not a requirement -- Cantrip can be used as a pure Python library. .. code-block:: bash % uvicorn cantrip.service.app:app --port 1974 --reload When Cantrip connects to a database it builds a **semantic graph** of entities -- metrics, dimensions, and more -- and how they are connected. Let's see which metrics are available: .. code-block:: bash % curl 'http://localhost:1974/db/metrics' | jq .. code-block:: json { "database": "db", "metrics": [ { "name": "total_orders", "type": "NUMBER", "description": null, "joins": null } ] } Let's see which dimensions are compatible with the metric ``total_orders``. .. code-block:: bash % curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' | jq .. code-block:: json { "database": "db", "dimensions": [ { "name": "dim_users.name", "type": "STRING", "grain": null, "incremental": false, "description": null }, { "name": "fact_orders.amount", "type": "NUMBER", "grain": null, "incremental": false, "description": null } { "name": "fact_orders.status", "type": "STRING", "grain": null, "incremental": false, "description": null } ] } What happened here? First, about the request we did. We're using a data serialization format called `Rison `_ (more specifically, O-Rison) to encode this object: .. code-block:: json {"metrics": ["total_orders"]} In a more compact representation that is URL safe: .. code-block:: o-rison metrics:!(total_orders) This returns the list of dimensions compatible with the all the metrics in the request. For our metric ``total_orders``, Cantrip was able to infer 3 compatible dimensions by analyzing primary and foreign keys defined in the database. .. container:: only-light .. graphviz:: _graphs/index/metric_dimension_path.light.dot .. container:: only-dark .. graphviz:: _graphs/index/metric_dimension_path.dark.dot We can now use Cantrip to generate the SQL for the ``total_orders`` metric alongside the ``dim_users.name`` dimensions, for example: .. code-block:: bash % curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(dim_users.name)' \ > | jq -r '.query.sql' .. code-block:: sql :class: db-example db-sqlite SELECT "__annotated"."dim_users.name" AS "dim_users.name [TEXT]", "__annotated"."total_orders" AS "total_orders [INTEGER]" FROM ( SELECT "__final"."dim_users.name" AS "dim_users.name", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders"."amount") AS "total_orders", "dim_users"."name" AS "dim_users.name" FROM "fact_orders" AS "fact_orders" JOIN "dim_users" ON "fact_orders"."user_id" = "dim_users"."id" WHERE "fact_orders"."status" <> 'canceled' GROUP BY "dim_users"."name" ) AS __final ) AS __annotated .. code-block:: sql :class: db-example db-duckdb SELECT "__final"."dim_users.name" AS "dim_users.name", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders"."amount") AS "total_orders", "dim_users"."name" AS "dim_users.name" FROM "fact_orders" AS "fact_orders" JOIN "dim_users" ON "fact_orders"."user_id" = "dim_users"."id" WHERE "fact_orders"."status" <> 'canceled' GROUP BY "dim_users"."name" ) AS __final .. code-block:: sql :class: db-example db-postgres SELECT "__final"."dim_users.name" AS "dim_users.name", "__final"."total_orders" AS "total_orders" FROM ( SELECT SUM("fact_orders"."amount") AS "total_orders", "dim_users"."name" AS "dim_users.name" FROM "fact_orders" AS "fact_orders" JOIN "dim_users" ON "fact_orders"."user_id" = "dim_users"."id" WHERE "fact_orders"."status" <> 'canceled' GROUP BY "dim_users"."name" ) AS __final .. container:: only-light .. graphviz:: _graphs/index/query.light.dot .. container:: only-dark .. graphviz:: _graphs/index/query.dark.dot We can also get the results directly. Let's look at ``total_orders`` by ``fact_orders.status``: .. code-block:: bash % curl 'http://localhost:1974/db/run?q=metrics:!(total_orders),dimensions:!(fact_orders.status)' | jq .. code-block:: json { "database": "db", "metrics": [ { "name": "total_orders", "type": "NUMBER", "description": null, "joins": null } ], "dimensions": [ { "name": "fact_orders.status", "type": "STRING", "grain": null, "incremental": false, "description": null } ], "rows": [ { "fact_orders.status": "complete", "total_orders": 350.0 }, { "fact_orders.status": "pending", "total_orders": 100.0 } ] } Even though this is a simple example, we can already see why Cantrip is special: - **Metrics are defined using native SQL**. They can be as complex as you want, as long as they produce a single aggregated column and read from a single relation -- which can itself be a view with complex joins. This gives you the full power of SQL when defining metrics. - **Semantics are stored in the database itself**. Metrics are defined as views, and relationships between metric and dimensions are either inferred from foreign keys or discovered from annotations (as we'll see in the next section). - **There is no vendor lock-in**. You can use Cantrip with any database supported by `SQLAlchemy `_ and `sqlglot `_, and moving from one database to another is just a matter of redefining the views. In the :ref:`next section ` we'll look at a slightly more complex example. .. toctree:: :maxdepth: 2 :caption: Documentation Home annotations enrichment derived_metrics segments populations hierarchies installation usage