Annotations¶

Cantrip follows a principle called “mostly inferred, progressively enrichable”. It uses the information_schema to read table schemas, views, and relationships, in order to build a semantic map of the database. But that information is not enough and, sometimes, is ambiguous. To resolve ambiguity and provide more context we can use annotations; how this is done depends on the database being used.


Database selector loading…

Let’s look at a slightly more complex example to see what this means. Imagine our fact table looks like this instead:

CREATE TABLE fact_orders (
  id INTEGER PRIMARY KEY,
  buyer_id INTEGER,
  seller_id INTEGER,
  amount INTEGER,
  FOREIGN KEY (buyer_id) REFERENCES dim_users(id),
  FOREIGN KEY (seller_id) REFERENCES dim_users(id)
);
CREATE TABLE fact_orders (
  id INTEGER,
  buyer_id INTEGER,
  seller_id INTEGER,
  amount INTEGER,
  FOREIGN KEY (buyer_id) REFERENCES dim_users(id),
  FOREIGN KEY (seller_id) REFERENCES dim_users(id)
);
CREATE TABLE fact_orders (
  id SERIAL PRIMARY KEY,
  buyer_id INTEGER REFERENCES dim_users(id),
  seller_id INTEGER REFERENCES dim_users(id),
  amount INTEGER
);

Because we now have 2 columns that can be joined with dim_users, requesting {total_orders} by {dim_users.name} is ambiguous.

digraph metric_dimension_path { graph [fontsize=10, fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", nodesep="0.35", ranksep="0.55", bgcolor="transparent"]; node [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=10, style="filled", penwidth=1.2]; edge [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=9, color="#4b5563", fontcolor="#4b5563", arrowsize=0.8]; metric_total_orders [ label="METRIC total_orders", shape=box, style="rounded,filled", peripheries=2, fillcolor="#dbeafe", color="#1d4ed8", fontcolor="#0b1220", ]; dimension_user_name [ label="DIMENSION dim_users.name", shape=box, style="rounded,filled", peripheries=2, fillcolor="#ccfbf1", color="#0f766e", fontcolor="#0b1220", ]; dimension_fact_orders_amount [ label="DIMENSION fact_orders.amount", shape=box, style="rounded,filled", fillcolor="#ccfbf1", color="#0f766e", fontcolor="#0b1220", ]; dimension_fact_orders_status [ label="DIMENSION fact_orders.status", shape=box, style="rounded,filled", fillcolor="#ccfbf1", color="#0f766e", fontcolor="#0b1220", ]; view_total_orders [ label="VIEW total_orders", shape=box, style="filled,dashed", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; table_fact_orders [ label="TABLE fact_orders", shape=box, style="filled", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; table_dim_users [ label="TABLE dim_users", shape=box, style="filled", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; join_users_orders [ label="JOIN fact_orders.? = dim_users.id", shape=box, style="filled,dotted", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; request [ label="QUERY {total_orders} by {dim_users.name}", shape=note, fillcolor="#f3f4f6", color="#4b5563", ]; metric_total_orders -> view_total_orders; view_total_orders -> table_fact_orders; table_fact_orders -> table_dim_users [label="buyer_id", style=dashed]; table_fact_orders -> table_dim_users [label="seller_id", style=dashed]; dimension_user_name -> table_dim_users; dimension_fact_orders_amount -> table_fact_orders; dimension_fact_orders_status -> table_fact_orders; table_fact_orders -> join_users_orders; table_dim_users -> join_users_orders; metric_total_orders -> request; dimension_user_name -> request; join_users_orders -> request [label="?", style=dotted]; { rank=same; table_fact_orders; table_dim_users; view_total_orders; } }

digraph metric_dimension_path { graph [fontsize=10, fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", nodesep="0.35", ranksep="0.55", bgcolor="transparent"]; node [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=10, style="filled", penwidth=1.2]; edge [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=9, color="#94a3b8", fontcolor="#94a3b8", arrowsize=0.8]; metric_total_orders [ label="METRIC total_orders", shape=box, style="rounded,filled", peripheries=2, fillcolor="#1e3a8a", color="#93c5fd", fontcolor="#e5e7eb" ]; dimension_user_name [ label="DIMENSION dim_users.name", shape=box, style="rounded,filled", peripheries=2, fillcolor="#134e4a", color="#5eead4", fontcolor="#e5e7eb" ]; dimension_fact_orders_amount [ label="DIMENSION fact_orders.amount", shape=box, style="rounded,filled", fillcolor="#134e4a", color="#5eead4", fontcolor="#e5e7eb" ]; dimension_fact_orders_status [ label="DIMENSION fact_orders.status", shape=box, style="rounded,filled", fillcolor="#134e4a", color="#5eead4", fontcolor="#e5e7eb" ]; view_total_orders [ label="VIEW total_orders", shape=box, style="filled,dashed", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; table_fact_orders [ label="TABLE fact_orders", shape=box, style="filled", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; table_dim_users [ label="TABLE dim_users", shape=box, style="filled", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; join_users_orders [ label="JOIN fact_orders.? = dim_users.id", shape=box, style="filled,dotted", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; request [ label="QUERY {total_orders} by {dim_users.name}", shape=note, fillcolor="#334155", color="#cbd5e1", fontcolor="#e5e7eb" ]; metric_total_orders -> view_total_orders; view_total_orders -> table_fact_orders; table_fact_orders -> table_dim_users [label="buyer_id", style=dashed]; table_fact_orders -> table_dim_users [label="seller_id", style=dashed]; dimension_user_name -> table_dim_users; dimension_fact_orders_amount -> table_fact_orders; dimension_fact_orders_status -> table_fact_orders; table_fact_orders -> join_users_orders; table_dim_users -> join_users_orders; metric_total_orders -> request; dimension_user_name -> request; join_users_orders -> request [label="?", style=dotted]; { rank=same; table_fact_orders; table_dim_users; view_total_orders; } }

We can resolve that ambiguity by annotating our metric definition:

CREATE VIEW total_orders (total_orders) AS
SELECT SUM(amount) AS "total_orders [join=seller_id]"
FROM fact_orders
WHERE status <> 'canceled';
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders
WHERE status <> 'canceled';

COMMENT ON COLUMN total_orders.total_orders IS 'join=seller_id';
CREATE VIEW total_orders AS
SELECT SUM(amount) AS total_orders
FROM fact_orders
WHERE status <> 'canceled';

COMMENT ON COLUMN total_orders.total_orders IS 'join=seller_id';

digraph metric_dimension_path { graph [fontsize=10, fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", nodesep="0.35", ranksep="0.55", bgcolor="transparent"]; node [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=10, style="filled", penwidth=1.2]; edge [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=9, color="#4b5563", fontcolor="#4b5563", arrowsize=0.8]; metric_total_orders [ label="METRIC total_orders", shape=box, style="rounded,filled", peripheries=2, fillcolor="#dbeafe", color="#1d4ed8", fontcolor="#0b1220", ]; dimension_user_name [ label="DIMENSION dim_users.name", shape=box, style="rounded,filled", peripheries=2, fillcolor="#ccfbf1", color="#0f766e", fontcolor="#0b1220", ]; dimension_fact_orders_amount [ label="DIMENSION fact_orders.amount", shape=box, style="rounded,filled", fillcolor="#ccfbf1", color="#0f766e", fontcolor="#0b1220", ]; dimension_fact_orders_status [ label="DIMENSION fact_orders.status", shape=box, style="rounded,filled", fillcolor="#ccfbf1", color="#0f766e", fontcolor="#0b1220", ]; view_total_orders [ label="VIEW total_orders", shape=box, style="filled,dashed", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; table_fact_orders [ label="TABLE fact_orders", shape=box, style="filled", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; table_dim_users [ label="TABLE dim_users", shape=box, style="filled", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; join_users_orders [ label="JOIN fact_orders.seller_id = dim_users.id", shape=box, style="filled,dotted", fillcolor="#fef3c7", color="#b45309", fontcolor="#0b1220", ]; request [ label="QUERY {total_orders} by {dim_users.name}", shape=note, fillcolor="#f3f4f6", color="#4b5563", ]; metric_total_orders -> view_total_orders; view_total_orders -> table_fact_orders [penwidth=2]; table_fact_orders -> table_dim_users [label="buyer_id", style=dashed]; table_fact_orders -> table_dim_users [label="seller_id", penwidth=2]; dimension_user_name -> table_dim_users; dimension_fact_orders_amount -> table_fact_orders; dimension_fact_orders_status -> table_fact_orders; table_fact_orders -> join_users_orders [penwidth=2]; table_dim_users -> join_users_orders [penwidth=2]; metric_total_orders -> request; dimension_user_name -> request; join_users_orders -> request [penwidth=2]; { rank=same; table_fact_orders; table_dim_users; view_total_orders; } }

digraph metric_dimension_path { graph [fontsize=10, fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", nodesep="0.35", ranksep="0.55", bgcolor="transparent"]; node [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=10, style="filled", penwidth=1.2]; edge [fontname="SFMono-Regular, SF Mono, Menlo, Monaco, monospace", fontsize=9, color="#94a3b8", fontcolor="#94a3b8", arrowsize=0.8]; metric_total_orders [ label="METRIC total_orders", shape=box, style="rounded,filled", peripheries=2, fillcolor="#1e3a8a", color="#93c5fd", fontcolor="#e5e7eb" ]; dimension_user_name [ label="DIMENSION dim_users.name", shape=box, style="rounded,filled", peripheries=2, fillcolor="#134e4a", color="#5eead4", fontcolor="#e5e7eb" ]; dimension_fact_orders_amount [ label="DIMENSION fact_orders.amount", shape=box, style="rounded,filled", fillcolor="#134e4a", color="#5eead4", fontcolor="#e5e7eb" ]; dimension_fact_orders_status [ label="DIMENSION fact_orders.status", shape=box, style="rounded,filled", fillcolor="#134e4a", color="#5eead4", fontcolor="#e5e7eb" ]; view_total_orders [ label="VIEW total_orders", shape=box, style="filled,dashed", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; table_fact_orders [ label="TABLE fact_orders", shape=box, style="filled", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; table_dim_users [ label="TABLE dim_users", shape=box, style="filled", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; join_users_orders [ label="JOIN fact_orders.seller_id = dim_users.id", shape=box, style="filled,dotted", fillcolor="#78350f", color="#fcd34d", fontcolor="#e5e7eb" ]; request [ label="QUERY {total_orders} by {dim_users.name}", shape=note, fillcolor="#334155", color="#cbd5e1", fontcolor="#e5e7eb" ]; metric_total_orders -> view_total_orders; view_total_orders -> table_fact_orders [penwidth=2]; table_fact_orders -> table_dim_users [label="buyer_id", style=dashed]; table_fact_orders -> table_dim_users [label="seller_id", penwidth=2]; dimension_user_name -> table_dim_users; dimension_fact_orders_amount -> table_fact_orders; dimension_fact_orders_status -> table_fact_orders; table_fact_orders -> join_users_orders [penwidth=2]; table_dim_users -> join_users_orders [penwidth=2]; metric_total_orders -> request; dimension_user_name -> request; join_users_orders -> request [penwidth=2]; { rank=same; table_fact_orders; table_dim_users; view_total_orders; } }

This is a core concept in Cantrip: metadata is always stored in the views themselves. There’s no separate store for the definitions, no sidecar tables, no YAML – just views in your database. They can even live in a separate namespace (schema), to prevent polluting your warehouse.

Annotations can be used for many different things. One common example is to indicate foreign key relationships when they’re missing from the database, using fk=. Here’s how we would connect fact_orders to dim_users if the table didn’t have foreign keys defined in the database:

-- cantrip: description=fact_orders annotated with FKs
CREATE VIEW fact_orders_annotated (id, buyer_id, seller_id, amount, status) AS
SELECT
  id,
  buyer_id AS "buyer_id [fk=dim_users.id]",
  seller_id AS "seller_id [fk=dim_users.id]",
  amount AS "amount [description=How many were bought]",
  status AS "status [description='pending, complete, or canceled']"
FROM fact_orders;
CREATE VIEW fact_orders_annotated AS
SELECT
  id,
  buyer_id,
  seller_id,
  amount,
  status
FROM fact_orders;

COMMENT ON VIEW fact_orders_annotated IS 'description=fact_orders annotated with FKs';
COMMENT ON COLUMN fact_orders_annotated.buyer_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.seller_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.amount IS 'description=How many were bought';
COMMENT ON COLUMN fact_orders_annotated.status IS 'description="pending, complete, or canceled"';
CREATE VIEW fact_orders_annotated AS
SELECT
  id,
  buyer_id,
  seller_id,
  amount,
  status
FROM fact_orders;

COMMENT ON VIEW fact_orders_annotated IS 'description=fact_orders annotated with FKs';
COMMENT ON COLUMN fact_orders_annotated.buyer_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.seller_id IS 'fk=dim_users.id';
COMMENT ON COLUMN fact_orders_annotated.amount IS 'description=How many were bought';
COMMENT ON COLUMN fact_orders_annotated.status IS 'description="pending, complete, or canceled"';

And as you can see in the example above, another common use case is to add descriptions to metrics, dimensions, and views. The full list of supported annotations is:

description=long description

Provides a description for a metric, dimension, or view. Description must be in quotes (single or double) if it contains a comma.

fk=$relation.$column

Points to a column in a table or view. Should be used only to indicate 1:1 (one-to-one) or n:1 (many-to-one) relationships, otherwise fanout might introduce incorrect results.

type=$type

Specifies the resulting type of an expression. Useful for SQLite, which has a more general dynamic type system.

join=$column

Indicates the preferred column for a join when multiple paths are available. Can be specified multiple times, for example, join=event_time, join=seller_id.

partition=latest and partition=event

Indicates how to join a fact table to a slowly changing dimension table.

incremental=true|false

Declares a column as append-only. This is not used by Cantrip, but can be leveraged by downstream services like caching.

time_grain=$grain

Indicates the time grain of a column, for example, time_grain=P1D (ISO) or time_grain=second (English). Used for time joins and virtual dimensions.

geo_grain=$grain

Indicates the spatial grain of a column. Used for spatial joins.

In the next example we’ll look at another unique feature of Cantrip: enrichment dimensions.