Modeling your Data
Best Table Formats for AI
We've found that Rasgo AI understands some table structures better than others. Examples:
Aggregated Summary Tables
AI does best with wide, pre-aggregated tables with all possible dimension slicers and metrics included, at a standard date granularity (i.e. Daily, Weekly, Monthly, etc...)
Dimension Tables
For dimension tables a standard star schema approach works best
It's important that the join columns are named the same in both the fact tables and dimension tables (i.e. PRODUCT_ID in the fact_sales table and PRODUCT_ID in the dim_product table). This removes guesswork when writing the joins.
Fact Tables
For fact tables that should be aggregated when queried, ensure that aggregations to accurately calculate the metrics over time are as straightforward as possible (i.e. if you want to count unique transactions, include transaction ID).
Header & Details Tables
AI can navigate between a header table and line item detail table (i.e. invoices) well, as long as the granularities of both tables are accurate in the notes
Table Formats to Avoid
For certain table formats, AI needs extra guidance to query them well. This is fine if you are an expert on how the table is structured, but be careful sharing these tables with others that are not familiar with them.
Entity - Attribute - Value (EAV) tables
It's best to Unpivot these tables into wide columnar tables
Mapping tables
It's best to pre-join mapping tables into pre-aggregated summary tables to reduce the number of joins required for each analysis query
Hierarchy tables
It's best to flatten hierarchy tables into a single dimensional lookup table, indexed at the lowest granularity
Partitioned tables
For scenarios where you have N number of tables that are identical but partitioned by one ID (such as a Customer ID or a date), it's best to create a VIEW that UNIONS or JOINS all of these tables into a single object, and adds in the partition by column so it's easily queryable
This reduces duplicated documentation and enables additional notes on a single dataset to help AI understand the contents of all of the partitioned tables
Important Notes for Certain Table Types
For certain types of tables, there are important notes that will help Rasgo AI understand your data better.
Aggregated Summary tables
AI does best with these tables when there is a note describing the grain of the table that makes each row unique (for example, a 'grain' could be 'Daily per Product ID per Customer ID')
As-Of tables
An "As-Of" metric table is a type of database table designed to store historical data points or snapshots of metrics as they were at specific points in time. This type of table is useful for tracking how metrics change over time and for performing historical analysis.
For this table type, it's important to identify the table as an as-of with a note, instructing the AI to always filter by date when querying the table unless it's specifically analyzing a change over a discrete time period
Entity - Attribute - Value (EAV) tables
In this model, instead of having a separate column for each attribute, the attributes are stored as rows in a single table.
For this table type, it's important to describe how the tabel is modeled with a note
it's also important to have all distinct values of the 'Entity' column described in the column description so AI knows how to filter on this column's values
Lookup tables
Lookup tables contain a set of predefined, often static, values that other tables can reference
For this table type, it's important to identify the table appropriately and describe how it should be joined with to accurately lookup reference values
Additional Considerations
If you have a fiscal calendar table and want AI to use it instead of the regular calendar, this should be mentioned in your Agent Instructions (including the fully qualified table name)
Adding notes describing the refresh schedule and ETAs of the data can be helpful to the AI when it's answering questions about the data
Pre-calculating important KPIs in summary tables can reduce compute time and increase accuracy
For date columns that will be queried frequently and only have a specific time period available, ask AI to add a note describing the date range of that column
Last updated