LogoLogo
Home PageLoginSQL Generator
  • ๐Ÿ–ฅ๏ธWhat is Rasgo?
  • ๐Ÿš€Setting Up Rasgo
    • Connect Rasgo to your Data
  • ๐Ÿ› ๏ธUsing Rasgo
    • Modeling your Data
    • Prompt Guide
    • AI Notes
    • AI-Generated Documentation
    • AI Readiness Score
    • Reports
    • Additional Features
    • Admin Settings
  • ๐ŸŽ‰What's New
  • Integrations
    • โ„๏ธSnowflake
    • ๐Ÿ”BigQuery
    • ๐Ÿ”ดRedshift
    • ๐ŸงฑDeltaLake (via Databricks)
    • ๐Ÿ’ฌOpenAI
    • ๐Ÿ…ฐ๏ธAnthropic
    • โœจGemini
    • โ˜๏ธdbt Cloud
  • ๐Ÿ”API
    • Table Metadata
    • Column Metadata
  • Reference
    • Status Page
    • Frequently Asked Questions
      • Rasgo Architecture
      • Contacting Rasgo Support
      • What does Rasgo do with my data?
  • Rasgo Graveyard
    • PyRasgo 0.3
      • Source Methods
        • publish.source_data()
        • read.source_data()
        • get.data_sources()
        • get.data_source()
      • Feature Methods
        • feature.get_stats()
        • publish.features_from_source_code()
        • publish.feature_from_source()
        • publish.features()
        • read.feature_data()
        • get.feature_attributes()
        • get.features()
        • get.feature()
      • Collection Methods
        • collection.add_attributes()
        • collection.preview()
        • collection.get_compatible_features()
        • read.collection_snapshot_data()
        • read.collection_data()
        • get.collection_attributes()
        • get.collections()
        • get.collection()
      • Features yml file
      • version 0.3
    • Transforms Overview
      • Build your Own Transform
        • Argument Types
        • Make your own Transform
        • SQL Best Practices
        • Utilities
          • cleanse_name()
    • All Transforms
      • Aggregate String
      • Aggregate
      • Apply
      • Bin
      • Cast
      • Clean
      • Conditional Agg
      • Correlation
      • Cumulative Agg
      • Datarobot Score
      • Dateadd
      • Datediff
      • Datepart
      • Datespine Groups
      • Datespine
      • Datetrunc
      • Describe
      • Drop Columns
      • Dropna
      • Encode Values
      • Entropy
      • Extract Sequences
      • Filter
      • Funnel
      • Heatmap
      • Histogram
      • If Then
      • Join
      • Joins
      • Label Encode
      • Lag
      • Latest
      • Lead
      • Linear Regression
      • Market Basket
      • Math
      • Metric Plot
      • Metric
      • Min Max Scaler
      • Moving Avg
      • New Columns
      • One Hot Encode
      • Order
      • Pivot Table
      • Plot
      • Prefix
      • Profile Column
      • Query
      • Rank
      • Ratio With Shrinkage
      • Remove Duplicates
      • Remove Outliers
      • Rename
      • Replace Missing
      • Replace String
      • Reshape
      • Rolling Agg
      • Rsi
      • Sample Class
      • Sample
      • Sankey
      • Scale Columns
      • Select
      • Sliding Slope
      • Standard Scaler
      • Suffix
      • Summarize Flatlines
      • Summarize Islands
      • Summarize
      • Target Encode
      • Text To Sql
      • Timeseries Agg
      • To Date
      • Train Test Split
      • Union
      • Unions
      • Unpivot
      • Uppercase Columns
      • Vlookup
Powered by GitBook
On this page
  • Parameters
  • Example
  • Source Code

Was this helpful?

  1. Rasgo Graveyard
  2. All Transforms

Remove Outliers

This function determines which records in the table are an outlier based on a given statistical method (z-score, IQR, or manual threshold) and a target column. It produces a new column named 'OUTLIER_<target_column>' which is TRUE for records that are outliers, and FALSE for records that aren't.

Parameters

Name
Type
Description
Is Optional

target_columns

column_list

The target columns containing values which will be used to calculate outliers. The z-score or IQR (depending on the selected method) for each column will be calculated independently. If any of the values in the selected columns are an outlier, the row will be flagged as an outlier (or dropped if drop = True).

drop

boolean

Whether or not to drop the rows which are determined outliers. If false, a new column will be created flagging outliers with a boolean.

method

value

The method used to calculate outliers. Supported methods are "iqr" which calculates the inter quartile range between the 1st and third quartile (IQR) and flags values that are more than 1.5 * IQR from the median, "z-score" which calculates the z-score and flags values with a Z-score more than the provided threshold, and "threshold" which requires and manually set minimum and maximum threshold which are used to flag outliers. Default Value = "z-score"

min_threshold

value

The minimum threshold for values that won't be flagged as outliers. Required if method is "threshold".

True

max_threshold

value

The maximum threshold for values that won't be flagged as outliers. Required if method is "threshold".

True

max_zscore

value

The maximum Z-score for values which will not be flagged as an outlier. Default Value = 2

True

Example

ds = rasgo.get.dataset(id)

# Drop outliers using a manual threshold
ds2 = ds.remove_outliers(
    target_column=["ORDERS", "CANCELLATIONS"],
    method="threshold",
    min_threshold=100,
    max_threshold=500,
    drop=True
)

# Drop values with a Z-score > 2 (more than 2 standard deviations from the mean)
ds2 = ds.remove_outliers(
    target_column=["ORDERS", "CANCELLATIONS"],
    method="z-score"
    drop=True,
    max_zscore=2
)

# Flag outliers using the iqr method
ds2 = ds.remove_outliers(
    target_column=["ORDERS"],
    method="iqr"
    drop=False
)
ds2.preview()

Source Code

PreviousRemove DuplicatesNextRename

Last updated 2 years ago

Was this helpful?

LogoRasgoTransforms/remove_outliers.sql at main ยท rasgointelligence/RasgoTransformsGitHub