Links

Summarize Islands

Given a dataset with a date column, summarizes the data in terms of islands, which are periods of time where data exists. This is often useful in determining if your data has gaps where data does not exist, or exists under certain conditions.
You must set a buffer such as 7 DAYS which will determine the grain of time for which one island stops and another begins.
The result is a summarized table.

Parameters

Name
Type
Description
Is Optional
group_cols
column_list
The column(s) used to partition you data into groups. Islands will be searched within each group
True
conditions
math_list
A list of conditions for which to apply to the data before searching for islands. For example, ["COL1 > 0","COL1 IS NOT NULL"]
True
date_col
column
The column used to create search for islands. This must be a date or datetime column.
​
buffer_date_part
date_part
A valid SQL datepart to slice the date_col. For interval types, see this Snowflake doc.​
​
buffer_size
int
An integer of how many date_parts will be considered to be a part of the same island. Larger numbers will cause more overlaps and therefore less islands, and smaller numbers will cause less overlaps and therefore more islands
​

Example

ds = rasgo.get.dataset(4721)
​
test = ds.apply(date_col='YEAR',
group_cols=['BABYNAME','STATE','GENDER'],
buffer_date_part='MONTH',
buffer_size=24,
conditions=['BABYCOUNT>50']
)
​
test.preview()
​

Source Code