publish.source_data()

Create a Rasgo DataSource from a csv, pandas DataFrame, or Snowflake table

Parameters

source_type:str:Valid values are "csv", "dataframe", or "table"

file_path:str: (Optional)Full path to the csv file to upload. Only pass when source_type="csv"

df:pandas DataFrame: (Optional)DataFrame to upload. Only pass when source_type="dataframe"

table:str: (Optional)Name of existing Snowflake table. Only pass when source_type="table"

data_source_name:str: (Optional)Name for this DataSource

data_source_domain:str: (Optional)Domain for this DataSource

data_source_table_name:str: (Optional)Name to give to the uploaded table in Snowflake. Only pass when source_type is "csv" or "dataframe"

parent_data_source_id:str: (Optional)Parent DataSource for this DataSource

if_exists:bool: (Optional)Instructions on how to proceed when a DataSource exists with this table. Valid values are "fail", "replace", "append".

NOTE: if_exists parameter

"replace" and "append" options are provided to support incremental loading of data through pyrasgo. If you are not operating on an existing DataSource, it is highly recommended that you pass this parameter as "fail". This will warn you if a DataSource exists with this table name.

Return Object

Rasgo DataSource

Sample Usage

Upload a csv:

source_type = "csv"
file_path = "Users/me/Downloads/myfile.csv"
data_source_name = "My CSV Test"
data_source_table_name = "CSV_MYFILE_TEST_ONE"
datasource = rasgo.publish.source_data(source_type,
                                       file_path,
                                       data_source_name,
                                       data_source_table_name,
                                       if_exists="replace"
                                       )                                       
print('DataSource:', datasource)

Upload a DataFrame:

source_type = "dataframe"
df = myPandasDf
data_source_name = "My DF Test"
data_source_table_name = "DF_MYPANDAS_TEST_ONE"
datasource = rasgo.publish.source_data(source_type,
                                       df,
                                       data_source_name,
                                       data_source_table_name,
                                       if_exists="append"
                                       )                                       
print('DataSource:', datasource)

Register a table:

source_type = "table"
table = "SFDATABASE.SFSCHEMA.EXISTING_SF_TBL"
data_source_name = "My Table Test"
datasource = rasgo.publish.source_data(source_type,
                                       table,
                                       data_source_name,
                                       if_exists="fail"
                                       )                                       
print('DataSource:', datasource)

Best Practices / Tips

TIP: Operating on existing sources

When re-loading data from a csv or DataFrame into an existing source, the _datasource_table_name _parameter must be included for this function to associate the data with the correct existing table. Omitting this parameter will create a new DataSource.

Last updated