Skip to contents

Iterative Binning Filter to Remove Sparse Bins

Usage

bin_filter(x, col, thresh = 10, breaks = 100)

Arguments

x

A data frame.

col

A string specifying the column to bin and filter by.

thresh

Minimum number of rows required per bin (default: 10).

breaks

Number of breaks to use in `cut()` (default: 100).

Value

A filtered data frame with the same columns as `x`, excluding the temporary bin column.

Details

Filters rows from a data frame such that each bin of a numeric column contains at least `thresh` observations. Binning is performed using `cut()` over a specified number of breaks. The process repeats until all bins meet the threshold.

Examples

df <- data.frame(val = runif(1000))
df_filtered <- bin_filter(df, col = "val", thresh = 20)
#> Error in summarise(., count = n()):  In argument: `count = n()`.
#>  In group 1: `bin = "(-0.000893,0.0101]"`.
#> Caused by error in `n()`:
#> ! could not find function "n"
hist(df_filtered$val)
#> Error in hist(df_filtered$val): object 'df_filtered' not found