Splitter
Builds stratification split for DataFrame. |
|
Split experiment parameters class. |
Split Builder
- class abacus.splitter.SplitBuilder(split_data: DataFrame, params: SplitBuilderParams)[source]
Builds stratification split for DataFrame.
Methods
collect()Calculated splits for init dataframe
Params
- class abacus.splitter.SplitBuilderParams(map_group_names_to_sizes: ~typing.Dict[str, int | None], main_strata_col: str, split_metric_col: str, metric_type: str = 'continuous', id_col: str = 'customer_id', cols: ~typing.List[str] = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), cat_cols: ~typing.List[str] = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), n_bins: int = 3, min_cluster_size: int = 100, strata_outliers_frac: float = 0.01, alpha: float = 0.05)[source]
Split experiment parameters class.
- Parameters:
map_group_names_to_sizes (Dict) – dictionary with group names and sizes. Key with name “control” is obligatory
main_strata_col (str) – the name of the column to be used first for splitting
split_metric_col (str) – the name of the column to be binning data for splitting
id_col (str) – the name of the column with id
cols – columns for stratification data
cat_cols – categorical columns that are using for stratification. These columns will be encoded as category features
n_bins – number of bins to be created based on split_metric_col
min_cluster_size – min count of samples in HDBSCAN cluster
strata_outliers_frac – frequency of outliers in strata
alpha – significance level for A/A test for split
Methods
alpha_validator
metric_type_validator