Entropy

Entropy is a way to calculate the amount of "disorder" in a non-numeric column. Lower entropy indicates less disorder, while higher entropy indicates more.

The calculation for Shannon's entropy is: H = -Sum[ P(xi) * log2( P(xi)) ]

Parameters

Example

ds = rasgo.get.dataset(id)

ds2 = ds.entropy(group_by=['FIPS'], columns=['NAME', 'ADDRESS'])
ds2.preview()

Source Code

Last updated