LogoLogo
Home PageLoginSQL Generator
  • ๐Ÿ–ฅ๏ธWhat is Rasgo?
  • ๐Ÿš€Setting Up Rasgo
    • Connect Rasgo to your Data
  • ๐Ÿ› ๏ธUsing Rasgo
    • Modeling your Data
    • Prompt Guide
    • AI Notes
    • AI-Generated Documentation
    • AI Readiness Score
    • Reports
    • Additional Features
    • Admin Settings
  • ๐ŸŽ‰What's New
  • Integrations
    • โ„๏ธSnowflake
    • ๐Ÿ”BigQuery
    • ๐Ÿ”ดRedshift
    • ๐ŸงฑDeltaLake (via Databricks)
    • ๐Ÿ’ฌOpenAI
    • ๐Ÿ…ฐ๏ธAnthropic
    • โœจGemini
    • โ˜๏ธdbt Cloud
  • ๐Ÿ”API
    • Table Metadata
    • Column Metadata
  • Reference
    • Status Page
    • Frequently Asked Questions
      • Rasgo Architecture
      • Contacting Rasgo Support
      • What does Rasgo do with my data?
  • Rasgo Graveyard
    • PyRasgo 0.3
      • Source Methods
        • publish.source_data()
        • read.source_data()
        • get.data_sources()
        • get.data_source()
      • Feature Methods
        • feature.get_stats()
        • publish.features_from_source_code()
        • publish.feature_from_source()
        • publish.features()
        • read.feature_data()
        • get.feature_attributes()
        • get.features()
        • get.feature()
      • Collection Methods
        • collection.add_attributes()
        • collection.preview()
        • collection.get_compatible_features()
        • read.collection_snapshot_data()
        • read.collection_data()
        • get.collection_attributes()
        • get.collections()
        • get.collection()
      • Features yml file
      • version 0.3
    • Transforms Overview
      • Build your Own Transform
        • Argument Types
        • Make your own Transform
        • SQL Best Practices
        • Utilities
          • cleanse_name()
    • All Transforms
      • Aggregate String
      • Aggregate
      • Apply
      • Bin
      • Cast
      • Clean
      • Conditional Agg
      • Correlation
      • Cumulative Agg
      • Datarobot Score
      • Dateadd
      • Datediff
      • Datepart
      • Datespine Groups
      • Datespine
      • Datetrunc
      • Describe
      • Drop Columns
      • Dropna
      • Encode Values
      • Entropy
      • Extract Sequences
      • Filter
      • Funnel
      • Heatmap
      • Histogram
      • If Then
      • Join
      • Joins
      • Label Encode
      • Lag
      • Latest
      • Lead
      • Linear Regression
      • Market Basket
      • Math
      • Metric Plot
      • Metric
      • Min Max Scaler
      • Moving Avg
      • New Columns
      • One Hot Encode
      • Order
      • Pivot Table
      • Plot
      • Prefix
      • Profile Column
      • Query
      • Rank
      • Ratio With Shrinkage
      • Remove Duplicates
      • Remove Outliers
      • Rename
      • Replace Missing
      • Replace String
      • Reshape
      • Rolling Agg
      • Rsi
      • Sample Class
      • Sample
      • Sankey
      • Scale Columns
      • Select
      • Sliding Slope
      • Standard Scaler
      • Suffix
      • Summarize Flatlines
      • Summarize Islands
      • Summarize
      • Target Encode
      • Text To Sql
      • Timeseries Agg
      • To Date
      • Train Test Split
      • Union
      • Unions
      • Unpivot
      • Uppercase Columns
      • Vlookup
Powered by GitBook
On this page
  • Parameters
  • Example
  • Source Code

Was this helpful?

  1. Rasgo Graveyard
  2. All Transforms

Clean

Cast data types, rename or drop columns, impute missing values, and filter values in a dataset

Parameters

Name
Type
Description
Is Optional

columns

clean_dict

Dictionary with keys as column names to clean, values are all optional: type - the dtype to cast the values to, name - the new name for a column, impute - an imputation strategy or value for replacing null values ('mean', 'median', 'mode', ), filter - a filter statement to filter the output table, drop - drops column from the output if true

Example

ds = rasgo.get.dataset(id)

ds2 = ds.clean(
    columns={
        'GLD_ADJUSTED_CLOSE': {
            'type': 'FLOAT',
            'name': 'GLD',
            'impute': 'mean',
            'filter': "> 100",
        },
        'GLTR_ADJUSTED_CLOSE': {
            'type': 'FLOAT',
            'name': 'GLTR',
            'impute': 'min',
            'filter': "> 10",
        },
        'DATE': {
            'type': 'string'
        }
    }
)

ds2.preview()

Source Code

PreviousCastNextConditional Agg

Last updated 2 years ago

Was this helpful?

LogoRasgoTransforms/clean.sql at main ยท rasgointelligence/RasgoTransformsGitHub