Annotations¶
Cantrip follows a principle called “mostly inferred, progressively enrichable”. It uses the information_schema to read table schemas, views, and relationships, in order to build a semantic map of the database. But that information is not enough and, sometimes, is ambiguous. To resolve ambiguity and provide more context we can use annotations; how this is done depends on the database being used.
Database selector loading…
Let’s look at a slightly more complex example to see what this means. Imagine our fact table looks like this instead:
CREATE TABLE fact_orders (
id INTEGER PRIMARY KEY,
buyer_id INTEGER,
seller_id INTEGER,
amount INTEGER,
FOREIGN KEY (buyer_id) REFERENCES dim_users(id),
FOREIGN KEY (seller_id) REFERENCES dim_users(id)
);
CREATE TABLE fact_orders (
id INTEGER,
buyer_id INTEGER,
seller_id INTEGER,
amount INTEGER,
FOREIGN KEY (buyer_id) REFERENCES dim_users(id),
FOREIGN KEY (seller_id) REFERENCES dim_users(id)
);
CREATE TABLE fact_orders (
id SERIAL PRIMARY KEY,
buyer_id INTEGER REFERENCES dim_users(id),
seller_id INTEGER REFERENCES dim_users(id),
amount INTEGER
);
Because we now have 2 columns that can be joined with dim_users, requesting {total_orders} by {dim_users.name} is ambiguous.
We can resolve that ambiguity by annotating our metric definition:
CREATE VIEW total_orders (total_orders) AS
SELECT SUM(amount) AS "total_orders [join=seller_id]"
FROM fact_orders
WHERE status <> 'canceled';
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders
WHERE status <> 'canceled';
COMMENT ON COLUMN total_orders.total_orders IS 'join=seller_id';
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders
WHERE status <> 'canceled';
COMMENT ON COLUMN total_orders.total_orders IS 'join=seller_id';
This is a core concept in Cantrip: metadata is always stored in the views themselves. There’s no separate store for the definitions, no sidecar tables, no YAML – just views in your database. They can even live in a separate namespace (schema), to prevent polluting your warehouse.
Annotations can be used for many different things. One common example is to indicate foreign key relationships when they’re missing from the database, using fk=. Here’s how we would connect fact_orders to dim_users if the table didn’t have foreign keys defined in the database:
-- cantrip: description=fact_orders annotated with FKs
CREATE VIEW fact_orders_annotated (id, buyer_id, seller_id, amount, status) AS
SELECT
id,
buyer_id AS "buyer_id [fk=dim_users.id]",
seller_id AS "seller_id [fk=dim_users.id]",
amount AS "amount [description=How many were bought]",
status AS "status [description='pending, complete, or canceled']"
FROM fact_orders;
CREATE VIEW fact_orders_annotated AS
SELECT
id,
buyer_id,
seller_id,
amount,
status
FROM fact_orders;
COMMENT ON VIEW fact_orders_annotated IS 'description=fact_orders annotated with FKs';
COMMENT ON COLUMN fact_orders_annotated.buyer_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.seller_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.amount IS 'description=How many were bought';
COMMENT ON COLUMN fact_orders_annotated.status IS 'description="pending, complete, or canceled"';
CREATE VIEW fact_orders_annotated AS
SELECT
id,
buyer_id,
seller_id,
amount,
status
FROM fact_orders;
COMMENT ON VIEW fact_orders_annotated IS 'description=fact_orders annotated with FKs';
COMMENT ON COLUMN fact_orders_annotated.buyer_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.seller_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.amount IS 'description=How many were bought';
COMMENT ON COLUMN fact_orders_annotated.status IS 'description="pending, complete, or canceled"';
And as you can see in the example above, another common use case is to add descriptions to metrics, dimensions, and views. The full list of supported annotations is:
description=long descriptionProvides a description for a metric, dimension, or view. Description must be in quotes (single or double) if it contains a comma.
fk=$relation.$columnPoints to a column in a table or view. Should be used only to indicate 1:1 (one-to-one) or n:1 (many-to-one) relationships, otherwise fanout might introduce incorrect results.
type=$typeSpecifies the resulting type of an expression. Useful for SQLite, which has a more general dynamic type system.
join=$columnIndicates the preferred column for a join when multiple paths are available. Can be specified multiple times, for example,
join=event_time, join=seller_id.partition=latestandpartition=eventIndicates how to join a fact table to a slowly changing dimension table.
incremental=true|falseDeclares a column as append-only. This is not used by Cantrip, but can be leveraged by downstream services like caching.
time_grain=$grainIndicates the time grain of a column, for example,
time_grain=P1D(ISO) ortime_grain=second(English). Used for time joins and virtual dimensions.geo_grain=$grainIndicates the spatial grain of a column. Used for spatial joins.
In the next example we’ll look at another unique feature of Cantrip: enrichment dimensions.