💫 Cantrip¶
Cantrip is a lightweight, hand-crafted, semantic layer as a library, written in modern Python. Semantics are stored as database views, so you can use native SQL to define metrics, dimensions, relationships and more, without the need for any external services. Cantrip generates the SQL queries needed to compute arbitrarily complex requests for multiple metrics and dimensions, and can even run the query for you.
Database selector loading…
Imagine we have these 2 tables in our database:
PRAGMA foreign_keys = ON;
CREATE TABLE dim_users (
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE fact_orders (
id INTEGER PRIMARY KEY,
user_id INTEGER,
amount INTEGER,
status STRING,
FOREIGN KEY (user_id) REFERENCES dim_users(id)
);
CREATE TABLE dim_users (
id INTEGER PRIMARY KEY,
name VARCHAR NOT NULL
);
CREATE TABLE fact_orders (
id INTEGER PRIMARY KEY,
user_id INTEGER,
amount INTEGER,
status VARCHAR,
FOREIGN KEY (user_id) REFERENCES dim_users(id)
);
CREATE TABLE dim_users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE fact_orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES dim_users(id),
amount INTEGER,
status TEXT
);
Let’s create a simple metric. We do that by defining a view. The metric should be an aggregation without any grain:
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders
WHERE status <> 'canceled';
We can now query the metric:
SELECT * FROM total_orders;
450.0
Not super useful. But let’s connect Cantrip to the database. In the following examples we’ll be using a simple service that comes with Cantrip, but the service is not a requirement – Cantrip can be used as a pure Python library.
% uvicorn cantrip.service.app:app --port 1974 --reload
When Cantrip connects to a database it builds a semantic graph of entities – metrics, dimensions, and more – and how they are connected. Let’s see which metrics are available:
% curl 'http://localhost:1974/db/metrics' | jq
{
"database": "db",
"metrics": [
{
"name": "total_orders",
"type": "NUMBER",
"description": null,
"joins": null
}
]
}
Let’s see which dimensions are compatible with the metric total_orders.
% curl 'http://localhost:1974/db/dimensions?q=metrics:!(total_orders)' | jq
{
"database": "db",
"dimensions": [
{
"name": "dim_users.name",
"type": "STRING",
"grain": null,
"incremental": false,
"description": null
},
{
"name": "fact_orders.amount",
"type": "NUMBER",
"grain": null,
"incremental": false,
"description": null
}
{
"name": "fact_orders.status",
"type": "STRING",
"grain": null,
"incremental": false,
"description": null
}
]
}
What happened here? First, about the request we did. We’re using a data serialization format called Rison (more specifically, O-Rison) to encode this object:
{"metrics": ["total_orders"]}
In a more compact representation that is URL safe:
metrics:!(total_orders)
This returns the list of dimensions compatible with the all the metrics in the request. For our metric total_orders, Cantrip was able to infer 3 compatible dimensions by analyzing primary and foreign keys defined in the database.
We can now use Cantrip to generate the SQL for the total_orders metric alongside the dim_users.name dimensions, for example:
% curl 'http://localhost:1974/db/sql?q=metrics:!(total_orders),dimensions:!(dim_users.name)' \
> | jq -r '.query.sql'
SELECT
"__annotated"."dim_users.name" AS "dim_users.name [TEXT]",
"__annotated"."total_orders" AS "total_orders [INTEGER]"
FROM (
SELECT
"__final"."dim_users.name" AS "dim_users.name",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders"."amount") AS "total_orders",
"dim_users"."name" AS "dim_users.name"
FROM "fact_orders" AS "fact_orders"
JOIN "dim_users"
ON "fact_orders"."user_id" = "dim_users"."id"
WHERE
"fact_orders"."status" <> 'canceled'
GROUP BY
"dim_users"."name"
) AS __final
) AS __annotated
SELECT
"__final"."dim_users.name" AS "dim_users.name",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders"."amount") AS "total_orders",
"dim_users"."name" AS "dim_users.name"
FROM "fact_orders" AS "fact_orders"
JOIN "dim_users"
ON "fact_orders"."user_id" = "dim_users"."id"
WHERE
"fact_orders"."status" <> 'canceled'
GROUP BY
"dim_users"."name"
) AS __final
SELECT
"__final"."dim_users.name" AS "dim_users.name",
"__final"."total_orders" AS "total_orders"
FROM (
SELECT
SUM("fact_orders"."amount") AS "total_orders",
"dim_users"."name" AS "dim_users.name"
FROM "fact_orders" AS "fact_orders"
JOIN "dim_users"
ON "fact_orders"."user_id" = "dim_users"."id"
WHERE
"fact_orders"."status" <> 'canceled'
GROUP BY
"dim_users"."name"
) AS __final
We can also get the results directly. Let’s look at total_orders by fact_orders.status:
% curl 'http://localhost:1974/db/run?q=metrics:!(total_orders),dimensions:!(fact_orders.status)' | jq
{
"database": "db",
"metrics": [
{
"name": "total_orders",
"type": "NUMBER",
"description": null,
"joins": null
}
],
"dimensions": [
{
"name": "fact_orders.status",
"type": "STRING",
"grain": null,
"incremental": false,
"description": null
}
],
"rows": [
{
"fact_orders.status": "complete",
"total_orders": 350.0
},
{
"fact_orders.status": "pending",
"total_orders": 100.0
}
]
}
Even though this is a simple example, we can already see why Cantrip is special:
Metrics are defined using native SQL. They can be as complex as you want, as long as they produce a single aggregated column and read from a single relation – which can itself be a view with complex joins. This gives you the full power of SQL when defining metrics.
Semantics are stored in the database itself. Metrics are defined as views, and relationships between metric and dimensions are either inferred from foreign keys or discovered from annotations (as we’ll see in the next section).
There is no vendor lock-in. You can use Cantrip with any database supported by SQLAlchemy and sqlglot, and moving from one database to another is just a matter of redefining the views.
In the next section we’ll look at a slightly more complex example.
Documentation