Modeling your Data

Best Table Formats for AI

We've found that Rasgo AI understands some table structures better than others. Examples:

  • Aggregated Summary Tables

    • AI does best with wide, pre-aggregated tables with all possible dimension slicers and metrics included, at a standard date granularity (i.e. Daily, Weekly, Monthly, etc...)

  • Dimension Tables

    • For dimension tables a standard star schema approach works best

    • It's important that the join columns are named the same in both the fact tables and dimension tables (i.e. PRODUCT_ID in the fact_sales table and PRODUCT_ID in the dim_product table). This removes guesswork when writing the joins.

  • Fact Tables

    • For fact tables that should be aggregated when queried, ensure that aggregations to accurately calculate the metrics over time are as straightforward as possible (i.e. if you want to count unique transactions, include transaction ID).

  • Header & Details Tables

    • AI can navigate between a header table and line item detail table (i.e. invoices) well, as long as the granularities of both tables are accurate in the notes

Table Formats to Avoid

For certain table formats, AI needs extra guidance to query them well. This is fine if you are an expert on how the table is structured, but be careful sharing these tables with others that are not familiar with them.

  1. Entity - Attribute - Value (EAV) tables

    • It's best to Unpivot these tables into wide columnar tables

  2. Mapping tables

    • It's best to pre-join mapping tables into pre-aggregated summary tables to reduce the number of joins required for each analysis query

  3. Hierarchy tables

    • It's best to flatten hierarchy tables into a single dimensional lookup table, indexed at the lowest granularity

  4. Partitioned tables

    • For scenarios where you have N number of tables that are identical but partitioned by one ID (such as a Customer ID or a date), it's best to create a VIEW that UNIONS or JOINS all of these tables into a single object, and adds in the partition by column so it's easily queryable

    • This reduces duplicated documentation and enables additional notes on a single dataset to help AI understand the contents of all of the partitioned tables

Important Notes for Certain Table Types

For certain types of tables, there are important notes that will help Rasgo AI understand your data better.

  • Aggregated Summary tables

    • AI does best with these tables when there is a note describing the grain of the table that makes each row unique (for example, a 'grain' could be 'Daily per Product ID per Customer ID')

  • As-Of tables

    • An "As-Of" metric table is a type of database table designed to store historical data points or snapshots of metrics as they were at specific points in time. This type of table is useful for tracking how metrics change over time and for performing historical analysis.

    • For this table type, it's important to identify the table as an as-of with a note, instructing the AI to always filter by date when querying the table unless it's specifically analyzing a change over a discrete time period

  • Entity - Attribute - Value (EAV) tables

    • In this model, instead of having a separate column for each attribute, the attributes are stored as rows in a single table.

    • For this table type, it's important to describe how the tabel is modeled with a note

    • it's also important to have all distinct values of the 'Entity' column described in the column description so AI knows how to filter on this column's values

  • Lookup tables

    • Lookup tables contain a set of predefined, often static, values that other tables can reference

    • For this table type, it's important to identify the table appropriately and describe how it should be joined with to accurately lookup reference values

Additional Considerations

  • If you have a fiscal calendar table and want AI to use it instead of the regular calendar, this should be mentioned in your Agent Instructions (including the fully qualified table name)

  • Adding notes describing the refresh schedule and ETAs of the data can be helpful to the AI when it's answering questions about the data

  • Pre-calculating important KPIs in summary tables can reduce compute time and increase accuracy

  • For date columns that will be queried frequently and only have a specific time period available, ask AI to add a note describing the date range of that column

Last updated