.. _annotations: Annotations =========== Cantrip follows a principle called "mostly inferred, progressively enrichable". It uses the ``information_schema`` to read table schemas, views, and relationships, in order to build a semantic map of the database. But that information is not enough and, sometimes, is ambiguous. To resolve ambiguity and provide more context we can use **annotations**; how this is done depends on the database being used. ---- .. container:: db-selector Database selector loading... Let's look at a slightly more complex example to see what this means. Imagine our fact table looks like this instead: .. code-block:: sql :class: db-example db-sqlite CREATE TABLE fact_orders ( id INTEGER PRIMARY KEY, buyer_id INTEGER, seller_id INTEGER, amount INTEGER, FOREIGN KEY (buyer_id) REFERENCES dim_users(id), FOREIGN KEY (seller_id) REFERENCES dim_users(id) ); .. code-block:: sql :class: db-example db-duckdb CREATE TABLE fact_orders ( id INTEGER, buyer_id INTEGER, seller_id INTEGER, amount INTEGER, FOREIGN KEY (buyer_id) REFERENCES dim_users(id), FOREIGN KEY (seller_id) REFERENCES dim_users(id) ); .. code-block:: sql :class: db-example db-postgres CREATE TABLE fact_orders ( id SERIAL PRIMARY KEY, buyer_id INTEGER REFERENCES dim_users(id), seller_id INTEGER REFERENCES dim_users(id), amount INTEGER ); Because we now have 2 columns that can be joined with ``dim_users``, requesting ``{total_orders}`` by ``{dim_users.name}`` is ambiguous. .. container:: only-light .. graphviz:: _graphs/annotations/ambiguous.light.dot .. container:: only-dark .. graphviz:: _graphs/annotations/ambiguous.dark.dot We can resolve that ambiguity by annotating our metric definition: .. code-block:: sql :class: db-example db-sqlite CREATE VIEW total_orders (total_orders) AS SELECT SUM(amount) AS "total_orders [join=seller_id]" FROM fact_orders WHERE status <> 'canceled'; .. code-block:: sql :class: db-example db-duckdb CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders WHERE status <> 'canceled'; COMMENT ON COLUMN total_orders.total_orders IS 'join=seller_id'; .. code-block:: sql :class: db-example db-postgres CREATE VIEW total_orders AS SELECT SUM(amount) AS total_orders FROM fact_orders WHERE status <> 'canceled'; COMMENT ON COLUMN total_orders.total_orders IS 'join=seller_id'; .. container:: only-light .. graphviz:: _graphs/annotations/unambiguous.light.dot .. container:: only-dark .. graphviz:: _graphs/annotations/unambiguous.dark.dot This is a **core concept** in Cantrip: metadata is always stored in the views themselves. There's no separate store for the definitions, no sidecar tables, no YAML -- just views in your database. They can even live in a separate namespace (schema), to prevent polluting your warehouse. Annotations can be used for many different things. One common example is to indicate foreign key relationships when they're missing from the database, using ``fk=``. Here's how we would connect ``fact_orders`` to ``dim_users`` if the table didn't have foreign keys defined in the database: .. code-block:: sql :class: db-example db-sqlite -- cantrip: description=fact_orders annotated with FKs CREATE VIEW fact_orders_annotated (id, buyer_id, seller_id, amount, status) AS SELECT id, buyer_id AS "buyer_id [fk=dim_users.id]", seller_id AS "seller_id [fk=dim_users.id]", amount AS "amount [description=How many were bought]", status AS "status [description='pending, complete, or canceled']" FROM fact_orders; .. code-block:: sql :class: db-example db-duckdb CREATE VIEW fact_orders_annotated AS SELECT id, buyer_id, seller_id, amount, status FROM fact_orders; COMMENT ON VIEW fact_orders_annotated IS 'description=fact_orders annotated with FKs'; COMMENT ON COLUMN fact_orders_annotated.buyer_id IS 'fk=dim_users.id'; COMMENT ON COLUMN fact_orders_annotated.seller_id IS 'fk=dim_users.id'; COMMENT ON COLUMN fact_orders_annotated.amount IS 'description=How many were bought'; COMMENT ON COLUMN fact_orders_annotated.status IS 'description="pending, complete, or canceled"'; .. code-block:: sql :class: db-example db-postgres CREATE VIEW fact_orders_annotated AS SELECT id, buyer_id, seller_id, amount, status FROM fact_orders; COMMENT ON VIEW fact_orders_annotated IS 'description=fact_orders annotated with FKs'; COMMENT ON COLUMN fact_orders_annotated.buyer_id IS 'fk=dim_users.id'; COMMENT ON COLUMN fact_orders_annotated.seller_id IS 'fk=dim_users.id'; COMMENT ON COLUMN fact_orders_annotated.amount IS 'description=How many were bought'; COMMENT ON COLUMN fact_orders_annotated.status IS 'description="pending, complete, or canceled"'; And as you can see in the example above, another common use case is to add descriptions to metrics, dimensions, and views. The full list of supported annotations is: ``description=long description`` Provides a description for a metric, dimension, or view. Description must be in quotes (single or double) if it contains a comma. ``fk=$relation.$column`` Points to a column in a table or view. Should be used only to indicate `1:1` (one-to-one) or `n:1` (many-to-one) relationships, otherwise **fanout** might introduce incorrect results. ``type=$type`` Specifies the resulting type of an expression. Useful for SQLite, which has a more general dynamic type system. ``join=$column`` Indicates the preferred column for a join when multiple paths are available. Can be specified multiple times, for example, ``join=event_time, join=seller_id``. ``partition=latest`` and ``partition=event`` Indicates how to join a fact table to a `slowly changing dimension table `_. ``incremental=true|false`` Declares a column as append-only. This is not used by Cantrip, but can be leveraged by downstream services like caching. ``time_grain=$grain`` Indicates the time grain of a column, for example, ``time_grain=P1D`` (ISO) or ``time_grain=second`` (English). Used for time joins and :ref:`virtual dimensions `. ``geo_grain=$grain`` Indicates the spatial grain of a column. Used for :ref:`spatial joins `. In the next example we'll look at another unique feature of Cantrip: :ref:`enrichment dimensions `.