Features yml file
Features are a reference to columns in a SQL table. Rasgo stores metadata about features to help users discover and consume them.
Feature metadata can be defined using a yml file or a dict.
Structure of a yml files
Features that reside in the same table belong to a DataSource. A yml file describes a DataSource (table) and the Features and Dimensions (columns) in it. Each yml file should describe a single DataSource.
Attributes
name
Name of the DataSource.
(Optional) will default to the sourceTable name if not supplied
sourceTable
The Snowflake table these features are stored in
Mandatory
sourceType
The type of data used to import this DataSource
Mandatory: restricted value in list [table, dataframe, csv]
sourceCode
The sql or python code used to create this feature (assuming there is value in storing this)
(Optional) free-form text field
tags
Free-form text tags to apply to all features
(Optional) List of strings
attributes
Free-form k:v dicts to apply to all features
(Optional) List of dicts
dimensions:
--
--
columnName
SQL column name of the dimension
Mandatory: Standard SQL column rules: no spaces or special characters.
Best practice to CAPITALIZE all letters
dataType
SQL datatype of the column
Mandatory: Standard SQL datatypes allowed:
string, int, float, date, bool
granularity
String describing the grain of this column. This will determine what other features can be joined with these features.
Mandatory: Allowed datetime values:
hour, day, week, month, quarter, year
features:
--
--
columnName
SQL column name of the feature
Mandatory: Standard SQL column rules: no spaces or special characters.
Best practice to CAPITALIZE all letters
dataType
SQL data type of the feature
Mandatory: Standard SQL datatypes allowed:
string, int, float, date, bool
displayName
The name that will display in the Rasgo UI
(Optional) Any string value. Spaces and special characters allowed.
Best practice to avoid double quotes (“) and semicolons (;)
description
A short description of the feature that will display in the Rasgo UI
(Optional) Any string value. Spaces and special characters allowed.
Best practice to avoid double quotes (“) and semicolons (;)
status
Status of the feature: sandbox or production
(Optional) restricted value in list: [Productionized, Sandboxed]
tags
Free-form text tags to apply to this feature only
(Optional) List of strings
attributes
Free-form k:v dicts to apply to this feature only
(Optional) List of dicts
Sample file:
"dimensions" are index fields that will be used to join features to other features
"granularity" can be any string that helps uniquely describe a dimension. Granularity is used to determine when dimensions across FeatureSets are of the same "grain" and can be joined to each other.
It is often helpful to think of granularity as a way to tag your features with taxonomy metadata. Consider:
Granularity for datetime fields may be logged as: year, quarter, month, day, second - to define the grain of a date or datetime column.
Granularity for geolocation data may be logged as: Country, State, CBG, FIPS, zipcode, latlong
Granularity for healthcare data may be logged as: patient, payer, provider, encounter
The "sourceTable" param can accept just a table name or a fully qualified table name (DB.SCHEMA.TABLE). If database and schema are not supplied, Rasgo will assume your account's default credentials. For most accounts this will be: Database = RASGO & Schema = PUBLIC
Last updated