publish.features_from_source_code()

Parameters

data_source_id:int:ID of a Rasgo DataSource

source_code_type:str:Valid values are "sql" or "python"

sql_definition:str: (Optional)Valid SQL select statement that will be used to make a view in the DataWarehouse. Mandatory param if source_code_type = "sql"

python_function:FunctionType: (Optional)Valid python function that accepts a df as a param and returns a df of features. Mandatory param if source_code_type = "python"

features:List[str]: (Optional)A list of column names in the DataSource table that should be registered as Features. If no value is passed, all columns in the source that are not listed in the `dimensions` parameter will be registered as features.

dimensions:List[str]:A list of column names in the DataSource table that should be registered as Dimensions

granularity:List[str]:A list of strings that describes the grain of the dimensions

feature_set_name:str: (Optional)Name for this set of Features

sandbox:bool: (Optional)True = mark these features are Sandbox (not Production-ready) | False = mark these features are Production-ready (default is True)

if_exists:str: (Optional) fail - returns an error message if a featureset already exists against this table | return - returns the featureset without operating on it | edit - edits the existing featureset | new - creates a new featureset

Return Object

Rasgo FeatureSet

Sample Usage

Create features from an existing source

sql="SELECT DATE, DATEADD(WEEK, 1, DATE) AS WEEKFROMTODAY, (TEMP - 32)/1.8  AS TEMPINCELCIUS FROM MYTABLE"

dimensions = ['DATE']
features = ['WEEKFROMTODAY', 'TEMPINCELCIUS']

featureset = rasgo.publish_features_from_source_code(
               data_source_id=100,
               source_code_type='sql',
               sql_definition=sql,
               dimensions=dimensions, 
               features=features, 
               granularity=['day'],
               name='My Sandbox Features',
               sandbox=True
               )
print('FeatureSet:', featureset)

Columns in your DataSource table that are not referenced in either the "dimensions" or "features" list will be ignored

Best Practices / Tips

TIP: Feature name warning message

If you receive the error: "APIError: Failed to create Feature {___}. This name is already in use in your organization. Feature names must be unique." this is a sign that a feature already exists with this name.

Options to remedy are:

  • If the existing feature is named correctly and the feature you are trying to upload needs a new name: use the publish.features() method to upload the feature with a different name.

  • If the existing feature is simply an earlier run of this feature, instruct the function to overwrite it by passing in the param if_exists='edit'

  • If the existing feature is named incorrectly, navigate to the WebApp to change that feature's name, then re-upload

Last updated