LogoLogo
Home PageLoginSQL Generator
  • 🖥️What is Rasgo?
  • 🚀Setting Up Rasgo
    • Connect Rasgo to your Data
  • 🛠️Using Rasgo
    • Modeling your Data
    • Prompt Guide
    • AI Notes
    • AI-Generated Documentation
    • AI Readiness Score
    • Reports
    • Additional Features
    • Admin Settings
  • 🎉What's New
  • Integrations
    • ❄️Snowflake
    • 🔍BigQuery
    • 🔴Redshift
    • 🧱DeltaLake (via Databricks)
    • 💬OpenAI
    • 🅰️Anthropic
    • ✨Gemini
    • ☁️dbt Cloud
  • 🔐API
    • Table Metadata
    • Column Metadata
  • Reference
    • Status Page
    • Frequently Asked Questions
      • Rasgo Architecture
      • Contacting Rasgo Support
      • What does Rasgo do with my data?
  • Rasgo Graveyard
    • PyRasgo 0.3
      • Source Methods
        • publish.source_data()
        • read.source_data()
        • get.data_sources()
        • get.data_source()
      • Feature Methods
        • feature.get_stats()
        • publish.features_from_source_code()
        • publish.feature_from_source()
        • publish.features()
        • read.feature_data()
        • get.feature_attributes()
        • get.features()
        • get.feature()
      • Collection Methods
        • collection.add_attributes()
        • collection.preview()
        • collection.get_compatible_features()
        • read.collection_snapshot_data()
        • read.collection_data()
        • get.collection_attributes()
        • get.collections()
        • get.collection()
      • Features yml file
      • version 0.3
    • Transforms Overview
      • Build your Own Transform
        • Argument Types
        • Make your own Transform
        • SQL Best Practices
        • Utilities
          • cleanse_name()
    • All Transforms
      • Aggregate String
      • Aggregate
      • Apply
      • Bin
      • Cast
      • Clean
      • Conditional Agg
      • Correlation
      • Cumulative Agg
      • Datarobot Score
      • Dateadd
      • Datediff
      • Datepart
      • Datespine Groups
      • Datespine
      • Datetrunc
      • Describe
      • Drop Columns
      • Dropna
      • Encode Values
      • Entropy
      • Extract Sequences
      • Filter
      • Funnel
      • Heatmap
      • Histogram
      • If Then
      • Join
      • Joins
      • Label Encode
      • Lag
      • Latest
      • Lead
      • Linear Regression
      • Market Basket
      • Math
      • Metric Plot
      • Metric
      • Min Max Scaler
      • Moving Avg
      • New Columns
      • One Hot Encode
      • Order
      • Pivot Table
      • Plot
      • Prefix
      • Profile Column
      • Query
      • Rank
      • Ratio With Shrinkage
      • Remove Duplicates
      • Remove Outliers
      • Rename
      • Replace Missing
      • Replace String
      • Reshape
      • Rolling Agg
      • Rsi
      • Sample Class
      • Sample
      • Sankey
      • Scale Columns
      • Select
      • Sliding Slope
      • Standard Scaler
      • Suffix
      • Summarize Flatlines
      • Summarize Islands
      • Summarize
      • Target Encode
      • Text To Sql
      • Timeseries Agg
      • To Date
      • Train Test Split
      • Union
      • Unions
      • Unpivot
      • Uppercase Columns
      • Vlookup
Powered by GitBook
On this page
  • Parameters
  • Return Object
  • Sample Usage
  • Best Practices / Tips

Was this helpful?

  1. Rasgo Graveyard
  2. PyRasgo 0.3
  3. Source Methods

publish.source_data()

Create a Rasgo DataSource from a csv, pandas DataFrame, or Snowflake table

Parameters

source_type:str:Valid values are "csv", "dataframe", or "table"

file_path:str: (Optional)Full path to the csv file to upload. Only pass when source_type="csv"

df:pandas DataFrame: (Optional)DataFrame to upload. Only pass when source_type="dataframe"

table:str: (Optional)Name of existing Snowflake table. Only pass when source_type="table"

data_source_name:str: (Optional)Name for this DataSource

data_source_domain:str: (Optional)Domain for this DataSource

data_source_table_name:str: (Optional)Name to give to the uploaded table in Snowflake. Only pass when source_type is "csv" or "dataframe"

parent_data_source_id:str: (Optional)Parent DataSource for this DataSource

if_exists:bool: (Optional)Instructions on how to proceed when a DataSource exists with this table. Valid values are "fail", "replace", "append".

NOTE: if_exists parameter

"replace" and "append" options are provided to support incremental loading of data through pyrasgo. If you are not operating on an existing DataSource, it is highly recommended that you pass this parameter as "fail". This will warn you if a DataSource exists with this table name.

Return Object

Rasgo DataSource

Sample Usage

Upload a csv:

source_type = "csv"
file_path = "Users/me/Downloads/myfile.csv"
data_source_name = "My CSV Test"
data_source_table_name = "CSV_MYFILE_TEST_ONE"
datasource = rasgo.publish.source_data(source_type,
                                       file_path,
                                       data_source_name,
                                       data_source_table_name,
                                       if_exists="replace"
                                       )                                       
print('DataSource:', datasource)

Upload a DataFrame:

source_type = "dataframe"
df = myPandasDf
data_source_name = "My DF Test"
data_source_table_name = "DF_MYPANDAS_TEST_ONE"
datasource = rasgo.publish.source_data(source_type,
                                       df,
                                       data_source_name,
                                       data_source_table_name,
                                       if_exists="append"
                                       )                                       
print('DataSource:', datasource)

Register a table:

source_type = "table"
table = "SFDATABASE.SFSCHEMA.EXISTING_SF_TBL"
data_source_name = "My Table Test"
datasource = rasgo.publish.source_data(source_type,
                                       table,
                                       data_source_name,
                                       if_exists="fail"
                                       )                                       
print('DataSource:', datasource)

Best Practices / Tips

TIP: Operating on existing sources

When re-loading data from a csv or DataFrame into an existing source, the _datasource_table_name _parameter must be included for this function to associate the data with the correct existing table. Omitting this parameter will create a new DataSource.

PreviousSource MethodsNextread.source_data()

Last updated 3 years ago

Was this helpful?