version 0.3

Release date: September 2021

What's Changing

In September, we will release version 0.3 of pyrago. This version will eliminate the concept of FeatureSets from pyrasgo. All references to FeatureSet will be removed or replaced with DataSource.

Can you be more specific?

The following objects will be completely removed from pyrasgo:

  • The FeatureSet primitive class

  • All "feature_set" methods

  • All "feature_set" parameters in functions

See below for a full list of changes and migration path.

Why are you making this change?

Rasgo v0 started with two primitives: Features and Collections. Features were representations of columns in a table and Collections were joined Features. We quickly realized a need to link Features in a conceptual wrapper. This was convenient for a few reasons, but chief among them was: features are most often built in batches as columns in the same table and share table-level metadata. FeatureSets - i.e. Features that are built in the same table, using the same code, having the same grain - were born.

As Rasgo evolved to support more use cases, the DataSource primitive was born to store information about data tranformation and feature lineage. The DataSource primitive has grown into such a core part of our Rasgo data model, that it has usurped all value that FeatureSets originally provided. The two primitives are currently redundant and will block some exciting product features we have planned for Q4. For this reason, we will retire the FeatureSet primitive in pyrasgo with a 21 beer salute.

Features and Dimensions will now be built directly on DataSources. The DataSource primitive will expand to include 3 new attributes: columns, features, and dimensions. We find this direct access pattern much easier to use and hope our customers will agree. All existing features and dimensions have already been migrated to support this relationship and all new ones will build the relationship more naturally.

Excited to see for yourself? Run: rasgo.get.data_sources()

What can I do to prepare?

The first thing to note is that Rasgo will support version 0.2.5 for a few months, so there is no immediate need to upgrade if you have production code dependent on FeatureSets. We recommend you upgrade as soon as possible to benefit from the efficiencies this release will provide.

See below for a full list of changes and a migration path for each. If for any reason this path will not work in your codebase, please contact Rasgo as soon as possible for support.

Migration Path

FeatureSet Primitive:

CurrentMigrate ToNotes

FeatureSet()

DataSource()

The FeatureSet class will no longer be available. All publish functions that used to return data in a FeatureSet class will now return data in a DataSource class. See attribute mapping doc below to understand changes.

FeatureSet Functions:

CurrentMigrate To

get.feature_set()

get.data_source()

get.feature_sets()

get.data_sources(with_features_only=True)

get.features_by_featureset(feature_set_id)

data_source = get.data_source(id)

data_source.features

get.columns_by_featureset(feature_set_id)

data_source = get.data_source(id)

data_source.dimensions

get.feature_set_yml(feature_set_id)

get.features_yml(data_source_id)

prepare_feature_set_dict(feature_set_id)

data_source = get.data_source(id)

data_source.to_dict()

prepare_feature_set_yml(feature_set_id, file_name, directory)

data_source = get.data_source(id)

data_source.to_yml(file_name, directory)

Functions calling FeatureSet Parameters:

CurrentMigrate To

publish.features_from_source(feature_set_name)

None (parameter deprecated)

publish.features_from_source_code(feature_set_name, feature_set_table_name)

derivative_source_name,

sql_view_name

Functions returning FeatureSet responses:

CurrentNow Returns

publish.features_from_source()

DataSource

publish.features_from_source_code()

DataSource

publish.features()

DataSource

Attribute Mapping

Feature Set AttributeData Source Attribute

id

id (not a direct equivalent)

name

name

sourceTable

dataTable.tableName

dataTable.tableDatabase

dataTable.tableSchema

dataTable.fqtn

sourceCode

sourceCode

features

features (all attributes same)

dimensions

dimensions (all attributes same)

granularities

granularities (all attributes same)

dataSource

N/A

N/A

columns

Last updated