Features yml file
Features are a reference to columns in a SQL table. Rasgo stores metadata about features to help users discover and consume them.
Feature metadata can be defined using a yml file or a dict.
Structure of a yml files
Features that reside in the same table belong to a DataSource. A yml file describes a DataSource (table) and the Features and Dimensions (columns) in it. Each yml file should describe a single DataSource.
Attributes
Attribute Name | Description | Value constraints |
---|---|---|
name | Name of the DataSource. | (Optional) will default to the sourceTable name if not supplied |
sourceTable | The Snowflake table these features are stored in | Mandatory |
sourceType | The type of data used to import this DataSource | Mandatory: restricted value in list [table, dataframe, csv] |
sourceCode | The sql or python code used to create this feature (assuming there is value in storing this) | (Optional) free-form text field |
tags | Free-form text tags to apply to all features | (Optional) List of strings |
attributes | Free-form k:v dicts to apply to all features | (Optional) List of dicts |
dimensions: | -- | -- |
columnName | SQL column name of the dimension | Mandatory: Standard SQL column rules: no spaces or special characters. Best practice to CAPITALIZE all letters |
dataType | SQL datatype of the column | Mandatory: Standard SQL datatypes allowed: string, int, float, date, bool |
granularity | String describing the grain of this column. This will determine what other features can be joined with these features. | Mandatory: Allowed datetime values: hour, day, week, month, quarter, year |
features: | -- | -- |
columnName | SQL column name of the feature | Mandatory: Standard SQL column rules: no spaces or special characters. Best practice to CAPITALIZE all letters |
dataType | SQL data type of the feature | Mandatory: Standard SQL datatypes allowed: string, int, float, date, bool |
displayName | The name that will display in the Rasgo UI | (Optional) Any string value. Spaces and special characters allowed. Best practice to avoid double quotes (“) and semicolons (;) |
description | A short description of the feature that will display in the Rasgo UI | (Optional) Any string value. Spaces and special characters allowed. Best practice to avoid double quotes (“) and semicolons (;) |
status | Status of the feature: sandbox or production | (Optional) restricted value in list: [Productionized, Sandboxed] |
tags | Free-form text tags to apply to this feature only | (Optional) List of strings |
attributes | Free-form k:v dicts to apply to this feature only | (Optional) List of dicts |
Sample file:
"dimensions" are index fields that will be used to join features to other features
"granularity" can be any string that helps uniquely describe a dimension. Granularity is used to determine when dimensions across FeatureSets are of the same "grain" and can be joined to each other.
It is often helpful to think of granularity as a way to tag your features with taxonomy metadata. Consider:
Granularity for datetime fields may be logged as: year, quarter, month, day, second - to define the grain of a date or datetime column.
Granularity for geolocation data may be logged as: Country, State, CBG, FIPS, zipcode, latlong
Granularity for healthcare data may be logged as: patient, payer, provider, encounter
The "sourceTable" param can accept just a table name or a fully qualified table name (DB.SCHEMA.TABLE). If database and schema are not supplied, Rasgo will assume your account's default credentials. For most accounts this will be: Database = RASGO & Schema = PUBLIC
Last updated