bin_filter
bin_filter.Rd
Iterative Binning Filter to Remove Sparse Bins
Arguments
- x
A data frame.
- col
A string specifying the column to bin and filter by.
- thresh
Minimum number of rows required per bin (default: 10).
- breaks
Number of breaks to use in `cut()` (default: 100).
Details
Filters rows from a data frame such that each bin of a numeric column contains at least `thresh` observations. Binning is performed using `cut()` over a specified number of breaks. The process repeats until all bins meet the threshold.
Examples
df <- data.frame(val = runif(1000))
df_filtered <- bin_filter(df, col = "val", thresh = 20)
#> Error in summarise(., count = n()): ℹ In argument: `count = n()`.
#> ℹ In group 1: `bin = "(-0.000893,0.0101]"`.
#> Caused by error in `n()`:
#> ! could not find function "n"
hist(df_filtered$val)
#> Error in hist(df_filtered$val): object 'df_filtered' not found