> For the complete documentation index, see [llms.txt](https://docs.rasgoml.com/rasgo-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.rasgoml.com/rasgo-docs/rasgo-0.1/all-transforms/remove_outliers.md).

# Remove Outliers

This function determines which records in the table are an outlier based on a given statistical method (z-score, IQR, or manual threshold) and a target column. It produces a new column named 'OUTLIER\_\<target\_column>' which is TRUE for records that are outliers, and FALSE for records that aren't.

## Parameters

| Name            | Type         | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Is Optional |
| --------------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| target\_columns | column\_list | The target columns containing values which will be used to calculate outliers. The z-score or IQR (depending on the selected method) for each column will be calculated independently. If any of the values in the selected columns are an outlier, the row will be flagged as an outlier (or dropped if drop = True).                                                                                                                                                  |             |
| drop            | boolean      | Whether or not to drop the rows which are determined outliers. If false, a new column will be created flagging outliers with a boolean.                                                                                                                                                                                                                                                                                                                                 |             |
| method          | value        | The method used to calculate outliers. Supported methods are "iqr" which calculates the inter quartile range between the 1st and third quartile (IQR) and flags values that are more than 1.5 \* IQR from the median, "z-score" which calculates the z-score and flags values with a Z-score more than the provided threshold, and "threshold" which requires and manually set minimum and maximum threshold which are used to flag outliers. Default Value = "z-score" |             |
| min\_threshold  | value        | The minimum threshold for values that won't be flagged as outliers. Required if method is "threshold".                                                                                                                                                                                                                                                                                                                                                                  | True        |
| max\_threshold  | value        | The maximum threshold for values that won't be flagged as outliers. Required if method is "threshold".                                                                                                                                                                                                                                                                                                                                                                  | True        |
| max\_zscore     | value        | The maximum Z-score for values which will not be flagged as an outlier. Default Value = 2                                                                                                                                                                                                                                                                                                                                                                               | True        |

## Example

```python
ds = rasgo.get.dataset(id)

# Drop outliers using a manual threshold
ds2 = ds.remove_outliers(
    target_column=["ORDERS", "CANCELLATIONS"],
    method="threshold",
    min_threshold=100,
    max_threshold=500,
    drop=True
)

# Drop values with a Z-score > 2 (more than 2 standard deviations from the mean)
ds2 = ds.remove_outliers(
    target_column=["ORDERS", "CANCELLATIONS"],
    method="z-score"
    drop=True,
    max_zscore=2
)

# Flag outliers using the iqr method
ds2 = ds.remove_outliers(
    target_column=["ORDERS"],
    method="iqr"
    drop=False
)
ds2.preview()
```

## Source Code

{% embed url="<https://github.com/rasgointelligence/RasgoTransforms/blob/main/rasgotransforms/rasgotransforms/transforms/remove_outliers/remove_outliers.sql>" %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.rasgoml.com/rasgo-docs/rasgo-0.1/all-transforms/remove_outliers.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
