version 0.3
Release date: September 2021
What's Changing
In September, we will release version 0.3 of pyrago. This version will eliminate the concept of FeatureSets from pyrasgo. All references to FeatureSet will be removed or replaced with DataSource.
Can you be more specific?
The following objects will be completely removed from pyrasgo:
The FeatureSet primitive class
All "feature_set" methods
All "feature_set" parameters in functions
See below for a full list of changes and migration path.
Why are you making this change?
Rasgo v0 started with two primitives: Features and Collections. Features were representations of columns in a table and Collections were joined Features. We quickly realized a need to link Features in a conceptual wrapper. This was convenient for a few reasons, but chief among them was: features are most often built in batches as columns in the same table and share table-level metadata. FeatureSets - i.e. Features that are built in the same table, using the same code, having the same grain - were born.
As Rasgo evolved to support more use cases, the DataSource primitive was born to store information about data tranformation and feature lineage. The DataSource primitive has grown into such a core part of our Rasgo data model, that it has usurped all value that FeatureSets originally provided. The two primitives are currently redundant and will block some exciting product features we have planned for Q4. For this reason, we will retire the FeatureSet primitive in pyrasgo with a 21 beer salute.
Features and Dimensions will now be built directly on DataSources. The DataSource primitive will expand to include 3 new attributes: columns, features, and dimensions. We find this direct access pattern much easier to use and hope our customers will agree. All existing features and dimensions have already been migrated to support this relationship and all new ones will build the relationship more naturally.
Excited to see for yourself? Run: rasgo.get.data_sources()
What can I do to prepare?
The first thing to note is that Rasgo will support version 0.2.5 for a few months, so there is no immediate need to upgrade if you have production code dependent on FeatureSets. We recommend you upgrade as soon as possible to benefit from the efficiencies this release will provide.
See below for a full list of changes and a migration path for each. If for any reason this path will not work in your codebase, please contact Rasgo as soon as possible for support.
Migration Path
FeatureSet Primitive:
Current | Migrate To | Notes |
---|---|---|
FeatureSet() | DataSource() | The FeatureSet class will no longer be available. All publish functions that used to return data in a FeatureSet class will now return data in a DataSource class. See attribute mapping doc below to understand changes. |
FeatureSet Functions:
Current | Migrate To |
---|---|
get.feature_set() | get.data_source() |
get.feature_sets() | get.data_sources(with_features_only=True) |
get.features_by_featureset(feature_set_id) | data_source = get.data_source(id) data_source.features |
get.columns_by_featureset(feature_set_id) | data_source = get.data_source(id) data_source.dimensions |
get.feature_set_yml(feature_set_id) | get.features_yml(data_source_id) |
prepare_feature_set_dict(feature_set_id) | data_source = get.data_source(id) data_source.to_dict() |
prepare_feature_set_yml(feature_set_id, file_name, directory) | data_source = get.data_source(id) data_source.to_yml(file_name, directory) |
Functions calling FeatureSet Parameters:
Current | Migrate To |
---|---|
publish.features_from_source(feature_set_name) | None (parameter deprecated) |
publish.features_from_source_code(feature_set_name, feature_set_table_name) | derivative_source_name, sql_view_name |
Functions returning FeatureSet responses:
Current | Now Returns |
---|---|
publish.features_from_source() | DataSource |
publish.features_from_source_code() | DataSource |
publish.features() | DataSource |
Attribute Mapping
Feature Set Attribute | Data Source Attribute |
---|---|
id | id (not a direct equivalent) |
name | name |
sourceTable | dataTable.tableName dataTable.tableDatabase dataTable.tableSchema dataTable.fqtn |
sourceCode | sourceCode |
features | features (all attributes same) |
dimensions | dimensions (all attributes same) |
granularities | granularities (all attributes same) |
dataSource | N/A |
N/A | columns |
Last updated