Splitter

SplitBuilder

Builds stratification split for DataFrame.

SplitBuilderParams

Split experiment parameters class.

Split Builder

class abacus.splitter.SplitBuilder(split_data: DataFrame, params: SplitBuilderParams)[source]

Builds stratification split for DataFrame.

Methods

collect()

Calculated splits for init dataframe

Params

class abacus.splitter.SplitBuilderParams(map_group_names_to_sizes: ~typing.Dict[str, int | None], main_strata_col: str, split_metric_col: str, metric_type: str = 'continuous', id_col: str = 'customer_id', cols: ~typing.List[str] = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), cat_cols: ~typing.List[str] = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), n_bins: int = 3, min_cluster_size: int = 100, strata_outliers_frac: float = 0.01, alpha: float = 0.05)[source]

Split experiment parameters class.

Parameters:
  • map_group_names_to_sizes (Dict) – dictionary with group names and sizes. Key with name “control” is obligatory

  • main_strata_col (str) – the name of the column to be used first for splitting

  • split_metric_col (str) – the name of the column to be binning data for splitting

  • id_col (str) – the name of the column with id

  • cols – columns for stratification data

  • cat_cols – categorical columns that are using for stratification. These columns will be encoded as category features

  • n_bins – number of bins to be created based on split_metric_col

  • min_cluster_size – min count of samples in HDBSCAN cluster

  • strata_outliers_frac – frequency of outliers in strata

  • alpha – significance level for A/A test for split

Methods

alpha_validator

metric_type_validator