# Features yml file

Features are a reference to columns in a SQL table. Rasgo stores metadata about features to help users discover and consume them.

Feature metadata can be defined using a yml file or a dict.

#### Structure of a yml files

Features that reside in the same table belong to a DataSource. A yml file describes a DataSource (table) and the Features and Dimensions (columns) in it. Each yml file should describe a single DataSource.

#### Attributes

| Attribute Name | Description                                                                                                            | Value constraints                                                                                                                            |
| -------------- | ---------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| name           | Name of the DataSource.                                                                                                | (Optional) will default to the sourceTable name if not supplied                                                                              |
| sourceTable    | The Snowflake table these features are stored in                                                                       | Mandatory                                                                                                                                    |
| sourceType     | The type of data used to import this DataSource                                                                        | Mandatory: restricted value in list \[table, dataframe, csv]                                                                                 |
| sourceCode     | The sql or python code used to create this feature (assuming there is value in storing this)                           | (Optional) free-form text field                                                                                                              |
| tags           | Free-form text tags to apply to all features                                                                           | (Optional) List of strings                                                                                                                   |
| attributes     | Free-form k:v dicts to apply to all features                                                                           | (Optional) List of dicts                                                                                                                     |
| dimensions:    | --                                                                                                                     | --                                                                                                                                           |
| columnName     | SQL column name of the dimension                                                                                       | <p>Mandatory: Standard SQL column rules: no spaces or special characters.</p><p>Best practice to CAPITALIZE all letters</p>                  |
| dataType       | SQL datatype of the column                                                                                             | <p>Mandatory: Standard SQL datatypes allowed:</p><p>string, int, float, date, bool</p>                                                       |
| granularity    | String describing the grain of this column. This will determine what other features can be joined with these features. | <p>Mandatory: Allowed datetime values:</p><p>hour, day, week, month, quarter, year</p>                                                       |
| features:      | --                                                                                                                     | --                                                                                                                                           |
| columnName     | SQL column name of the feature                                                                                         | <p>Mandatory: Standard SQL column rules: no spaces or special characters.</p><p>Best practice to CAPITALIZE all letters</p>                  |
| dataType       | SQL data type of the feature                                                                                           | <p>Mandatory: Standard SQL datatypes allowed:</p><p>string, int, float, date, bool</p>                                                       |
| displayName    | The name that will display in the Rasgo UI                                                                             | <p>(Optional) Any string value. Spaces and special characters allowed.</p><p>Best practice to avoid double quotes (“) and semicolons (;)</p> |
| description    | A short description of the feature that will display in the Rasgo UI                                                   | <p>(Optional) Any string value. Spaces and special characters allowed.</p><p>Best practice to avoid double quotes (“) and semicolons (;)</p> |
| status         | Status of the feature: sandbox or production                                                                           | (Optional) restricted value in list: \[Productionized, Sandboxed]                                                                            |
| tags           | Free-form text tags to apply to this feature only                                                                      | (Optional) List of strings                                                                                                                   |
| attributes     | Free-form k:v dicts to apply to this feature only                                                                      | (Optional) List of dicts                                                                                                                     |

#### Sample file:

```
name: "Customer Transactions"
sourceType: table
sourceTable: CUSTOMER_TRANSACTIONS
tags:
- apply_to_all_features
features:
- columnName: TRANS_AMT
  displayName: "Transaction Amount"
  dataType: float
  description: "Total of transaction in USD"
  status: Productionized
  tags:
  - USD
- columnName: ITEM_CT
  displayName: "Item Count"
  dataType: integer
  description: "Number of items in cart"
  status: Productionized
- columnName: STORE_NAME
  displayName: "Store Name"
  dataType: string
  description: "Name of store"
  status: Productionized
- columnName: COUPONS_USED
  displayName: "Coupons Used"
  dataType: bool
  description: "Were any coupons used"
  status: Productionized
dimensions:
- columnName: TRANS_DATE
  dataType: date
  granularity: day
- columnName: CUSTOMER_ID
  dataType: int
  granularity: customer
```

{% hint style="info" %}
"dimensions" are index fields that will be used to join features to other features
{% endhint %}

{% hint style="info" %}
"granularity" can be any string that helps uniquely describe a dimension. Granularity is used to determine when dimensions across FeatureSets are of the same "grain" and can be joined to each other.

It is often helpful to think of granularity as a way to tag your features with taxonomy metadata. Consider:

Granularity for datetime fields may be logged as: year, quarter, month, day, second - to define the grain of a date or datetime column.

Granularity for geolocation data may be logged as: Country, State, CBG, FIPS, zipcode, latlong

Granularity for healthcare data may be logged as: patient, payer, provider, encounter
{% endhint %}

{% hint style="info" %}
The "sourceTable" param can accept just a table name or a fully qualified table name (DB.SCHEMA.TABLE). If database and schema are not supplied, Rasgo will assume your account's default credentials. For most accounts this will be: Database = RASGO & Schema = PUBLIC
{% endhint %}
