Auto A/B
ABTest
- class abacus.auto_ab.ABTest(dataset: DataFrame | None, params: ABTestParams)[source]
Performs different calculations of A/B-test.
Results evaluation for different metric types (continuous, binary, ratio).
Bucketing (decrease number of points, normal distribution of metric of interest)
Example:
from abacus.auto_ab.abtest import ABTest from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams data_params = DataParams(...) hypothesis_params = HypothesisParams(...) ab_params = ABTestParams(data_params, hypothesis_params) df = pd.read_csv('data.csv') ab_test = ABTest(df, ab_params) ab_test.test_welch() # {'stat': 5.172, 'p-value': 0.312, 'result': 0}
- Attributes:
- dataset
Methods
Performs bucketing in order to accelerate results computation.
cupac()Performs CUPAC for variance reduction.
cuped()Performs CUPED for variance reduction.
Creates linearized continuous metric based on ratio-metric.
plot([kind, save_path])Plot experiment.
Resplit dataframe.
Performs bootstrap confidence interval and zero statistical significance.
Performs bootstrap hypothesis testing by calculation of false positives.
Performs bootstrap for ratio-metric.
Performs Welch's t-test for independent samples with unequal number of observations and variance.
Performs buckets hypothesis testing.
Performs Chi-Square test.
Delta method with bias correction for ratios.
Performs Mann-Whitney U test.
Calculate expectation and variance of ratio for each group and then use t-test for hypothesis testing.
Performs Welch's t-test.
Performs z-test for proportions.
filter_outliers
metric_transform
report
Performs different calculations of A/B-test. |
|
Performs bucketing in order to accelerate results computation. |
|
Performs CUPAC for variance reduction. |
|
Performs CUPED for variance reduction. |
|
Creates linearized continuous metric based on ratio-metric. |
|
Plot experiment. |
|
Resplit dataframe. |
|
Performs bootstrap confidence interval and zero statistical significance. |
|
Performs bootstrap hypothesis testing by calculation of false positives. |
|
Performs bootstrap for ratio-metric. |
|
Performs Welch's t-test for independent samples with unequal number of observations and variance. |
|
Performs buckets hypothesis testing. |
|
Performs Chi-Square test. |
|
Delta method with bias correction for ratios. |
|
Performs Mann-Whitney U test. |
|
Calculate expectation and variance of ratio for each group and then use t-test for hypothesis testing. |
|
Performs Welch's t-test. |
|
Performs z-test for proportions. |
- abacus.auto_ab.ABTest.__bucketize(self, x: List[float] | ndarray[Any, dtype[ScalarType]] | Series) ndarray
Split array
xinto N non-overlapping buckets.There are two purposes for these actions:
Decrease number of data points of experiment.
Get normal distribution of a metric of interest.
Procedure:
Shuffle elements of an array.
Split points into N non-overlapping buckets.
On every bucket calculate metric of interest.
- Parameters:
x (np.ndarray) – Array to split.
- Returns:
Splitted into buckets array.
- Return type:
np.ndarray
- abacus.auto_ab.ABTest.__check_required_columns(self, df: DataFrame, method: str) None
Check presence of columns in dataframe.
- Parameters:
df (pandas.DataFrame) – DataFrame to check.
method (str) – Stage of A/B process which you’d like to test.
- Raises:
ValueError – If is_valid_col is False. Experiment cannot be provided
if required columns are absent. –
- abacus.auto_ab.ABTest.__get_group(self, group_label: str, df: DataFrame | None = None) ndarray
Gets target metric column based on desired group label.
- Parameters:
group_label (str) – Group label, e.g. ‘A’, ‘B’.
df (DataFrameType, optional) – DataFrame to query from.
- Returns:
Target column for a desired group.
- Return type:
numpy.ndarray
- abacus.auto_ab.ABTest.__delta_params(self, x: DataFrame) Tuple[float, float]
Calculated expectation and variance for ratio metric using delta approximation.
Source: https://arxiv.org/pdf/1803.06336.pdf.
- Parameters:
x (pandas.DataFrame) – Pandas DataFrame of particular group (A, B, etc).
- Returns:
Mean and variance of ratio metric.
- Return type:
Tuple[float, float]
- abacus.auto_ab.ABTest.__manual_ttest(self, ctrl_mean: float, ctrl_var: float, ctrl_size: int, treat_mean: float, treat_var: float, treat_size: int) Dict[str, int | float]
Performs Welch’s t-test based on aggregation of metrics instead of datasets.
For empirical calculation of T-statistic we need: expectation, variance, array size for each group.
- Parameters:
ctrl_mean (float) – Mean of control group.
ctrl_var (float) – Variance of control group.
ctrl_size (int) – Size of control group.
treat_mean (float) – Mean of treatment group.
treat_var (float) – Variance of treatment group.
treat_size (int) – Size of treatment group.
- Returns:
Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.
- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.__taylor_params(self, x: DataFrame) Tuple[float, float]
Calculated expectation and variance for ratio metric using Taylor expansion approximation.
Source: https://www.stat.cmu.edu/~hseltman/files/ratio.pdf.
- Parameters:
x (pandas.DataFrame) – Pandas DataFrame of particular group (A, B, etc).
- Returns:
Mean and variance of ratio metric.
- Return type:
Tuple[float, float]
- abacus.auto_ab.ABTest.bucketing(self) ABTest
Performs bucketing in order to accelerate results computation.
- Returns:
New instance of
ABTestclass with modified control and treatment.- Return type:
- abacus.auto_ab.ABTest.cupac(self) ABTest
Performs CUPAC for variance reduction.
- Returns:
New instance of
ABTestclass with modified control and treatment.- Return type:
- abacus.auto_ab.ABTest.cuped(self) ABTest
Performs CUPED for variance reduction.
- Returns:
New instance of
ABTestclass with modified control and treatment.- Return type:
- abacus.auto_ab.ABTest.linearization(self) ABTest
Creates linearized continuous metric based on ratio-metric. Important: there is an assumption that all data is already grouped by user s.t. numerator for user = sum of numerators for user for different time periods and denominator for user = sum of denominators for user for different time periods
- abacus.auto_ab.ABTest.plot(self, kind: str = 'experiment', save_path: str | None = None) None
Plot experiment.
- Parameters:
kind (str) – Kind of plot: ‘experiment’, ‘bootstrap’.
save_path (str, optional) – Path where to save image.
- Raises:
ValueError – If kind is not in [‘experiment’, ‘bootstrap’].
- abacus.auto_ab.ABTest.resplit_df(self) ABTest
Resplit dataframe.
- Returns:
Instance of
ABTestclass with modified control and treatment.- Return type:
- abacus.auto_ab.ABTest.test_boot_confint(self) Dict[str, int | float]
Performs bootstrap confidence interval and zero statistical significance.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_boot_fp(self) Dict[str, int | float]
Performs bootstrap hypothesis testing by calculation of false positives.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_boot_ratio(self) Dict[str, int | float]
Performs bootstrap for ratio-metric.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_boot_welch(self) Dict[str, int | float]
Performs Welch’s t-test for independent samples with unequal number of observations and variance.
Welch’s t-test is used as a wider approaches with fewer restrictions on samples size as in Student’s t-test.
Statistic of the test:
\[t = \frac{\hat{X}_1 - \hat{X}_2}{\sqrt{\frac{s_1}{\sqrt{N_1}} + \frac{s_2}{\sqrt{N_2}}}}.\]- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_buckets(self) Dict[str, int | float]
Performs buckets hypothesis testing.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_chisquare(self) Dict[str, int | float]
Performs Chi-Square test.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_delta_ratio(self) Dict[str, int | float]
Delta method with bias correction for ratios.
Source: https://arxiv.org/pdf/1803.06336.pdf.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_mannwhitney(self) Dict[str, int | float]
Performs Mann-Whitney U test.
Test works on continues metrics and their ranks.
Assumptions of Mann-Whitney test:
Independence of observations.
Same shape of metric distributions.
Statistic of the test:
\[U = \sum_{i=1}^{n} \sum_{j=1}^{m} S(X_i, Y_j).\]- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_taylor_ratio(self) Dict[str, int | float]
Calculate expectation and variance of ratio for each group and then use t-test for hypothesis testing.
Source: http://www.stat.cmu.edu/~hseltman/files/ratio.pdf.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_welch(self) Dict[str, int | float]
Performs Welch’s t-test.
- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
- abacus.auto_ab.ABTest.test_z_proportions(self) Dict[str, int | float]
Performs z-test for proportions.
The two-proportions z-test is used to compare two observed proportions.
Statistic of the test:
\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}.\]- Returns:
Dictionary with following properties:
test statistic,p-value,test result. Test result: 1 - significant different, 0 - insignificant difference.- Return type:
stat_test_typing
VarianceReduction
- class abacus.auto_ab.VarianceReduction[source]
Implementation of sensitivity increasing approaches.
As it is easier to apply variance reduction techniques directly to experiment, all approaches should be called on
ABTestclass instance.Example:
from abacus.auto_ab.abtest import ABTest from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams data_params = DataParams(...) hypothesis_params = HypothesisParams(...) ab_params = ABTestParams(data_params, hypothesis_params) df = pd.read_csv('data.csv') ab_test = ABTest(df, ab_params) ab_test = ab_test.cuped()
Methods
cupac(x, target_prev_col, target_now_col, ...)Perform CUPED on target variable with covariate calculated as a prediction from a linear regression model.
cuped(df, target_col, groups_col, covariate_col)Perform CUPED on target variable with predefined covariate.
- abacus.auto_ab.VarianceReduction._target_encoding(x: DataFrame, encoding_columns: List[str] | ndarray[Any, dtype[ScalarType]] | Series, target_column: str) DataFrame
Encodes target column.
- abacus.auto_ab.VarianceReduction._predict_target(x: DataFrame, target_prev_col: str, factors_prev_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series, factors_now_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series) List[float] | ndarray[Any, dtype[ScalarType]] | Series
Covariate prediction with linear regression model.
- Parameters:
x (pandas.DataFrame) – Pandas DataFrame.
target_prev_col (str) – Target on previous period column name.
factors_prev_cols (List[str]) – Factor columns for modelling.
factors_now_cols (List[str]) – Factor columns for prediction on current period.
- Returns:
Pandas Series with predicted values
- Return type:
pandas.Series
- abacus.auto_ab.VarianceReduction.cuped(df: DataFrame, target_col: str, groups_col: str, covariate_col: str) DataFrame
Perform CUPED on target variable with predefined covariate.
Covariate has to be chosen with regard to the following restrictions:
Covariate is independent of an experiment.
Covariate is highly correlated with target variable.
Covariate is continuous variable.
Original paper: https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf.
- Parameters:
df (pandas.DataFrame) – Pandas DataFrame for analysis.
target_col (str) – Target column name.
groups_col (str) – Groups A and B column name.
covariate_col (str) – Covariate column name. If None, then most correlated column in considered as covariate.
- Returns:
Pandas DataFrame with additional target CUPEDed column
- Return type:
pandas.DataFrame
- abacus.auto_ab.VarianceReduction.cupac(x: DataFrame, target_prev_col: str, target_now_col: str, factors_prev_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series, factors_now_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series, groups_col: str) DataFrame
Perform CUPED on target variable with covariate calculated as a prediction from a linear regression model.
Original paper: https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/.
- Parameters:
x (pandas.DataFrame) – Pandas DataFrame for analysis.
target_prev_col (str) – Target on previous period column name.
target_now_col (str) – Target on current period column name.
factors_prev_cols (List[str]) – Factor columns for modelling.
factors_now_cols (List[str]) – Factor columns for prediction on current period.
groups_col (str) – Groups column name.
- Returns:
Pandas DataFrame with additional columns: target_pred and target_now_cuped
- Return type:
pandas.DataFrame
Graphics
- class abacus.auto_ab.Graphics[source]
Illustration of an experiment.
As it is easier to apply plotting directly to experiment, all methods should be called on
ABTestclass instance.Experiment’s plot is based on metric type.
Example:
from abacus.auto_ab.abtest import ABTest from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams data_params = DataParams(...) hypothesis_params = HypothesisParams(...) ab_params = ABTestParams(data_params, hypothesis_params) df = pd.read_csv('data.csv') ab_test = ABTest(df, ab_params) ab_test.plot()
Methods
plot_binary_experiment(params, save_path)Plot experiment with binary outcome.
plot_bootstrap_confint(params, save_path)Plot bootstrapped metric of an experiment with its confidence interval and zero value.
plot_continuous_experiment(params, save_path)Plot distributions of continuous metric and actual experiment metric.
- abacus.auto_ab.Graphics.plot_continuous_experiment(params: ABTestParams, save_path: str | None) None
Plot distributions of continuous metric and actual experiment metric.
- Parameters:
params (ABTestParams) – Parameters of the experiment.
save_path (str, optional) – Path where to save image.
- abacus.auto_ab.Graphics.plot_binary_experiment(params: ABTestParams, save_path: str | None) None
Plot experiment with binary outcome.
- Parameters:
params (ABTestParams) – Parameters of the experiment.
save_path (str, optional) – Path where to save image.
- abacus.auto_ab.Graphics.plot_bootstrap_confint(params: ABTestParams, save_path: str | None) None
Plot bootstrapped metric of an experiment with its confidence interval and zero value.
- Parameters:
x (np.ndarray) – Bootstrap metric.
params (ABTestParams) – Parameters of the experiment.
Params
- class abacus.auto_ab.DataParams(id_col: str = 'id', group_col: str = 'groups', control_name: str = 'A', treatment_name: str = 'B', is_grouped: bool | None = True, strata_col: str | None = '', target: str | None = '', numerator: str | None = '', denominator: str | None = '', covariate: str | None = '', target_prev: str | None = '', predictors_now: ~typing.List[str] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), predictors_prev: ~typing.List[str] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), control: ~typing.List[float] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), treatment: ~typing.List[float] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), transforms: ~typing.List[str] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}))[source]
Data description as column names of dataset generated during experiment.
- Parameters:
id_col (str) – ID of observations.
group_col (str) – Group of experiment.
control_name (str) – Name of control group in
group_col.treatment_name (str) – Name of treatment group in
group_col.is_grouped (bool, Optional) – Flag that shows whether observations are grouped.
strata_col (str, Optional) – Name of stratification column. Stratification column must be categorical.
target (str, Optional) – Target column name of continuous or binary metric.
numerator (str, Optional) – Numerator for ratio metric.
denominator (str, Optional) – Denominator for ratio metric.
covariate (str, Optional) – Covariate column for CUPED.
target_prev (str, Optional) – Target column name for previous period of continuous metric.
predictors_now (List[str], Optional) – List of columns to predict covariate.
predictors_prev (List[str], Optional) – List of columns to create linear model for covariate prediction.
control (ArrayNumType, Optional) – Control group data used for quick access and excluding querying dataset.
treatment (ArrayNumType, Optional) – Treatment group data used for quick access and excluding querying dataset.
transforms (ArrayStrType, Optional) – List of transformations applied to experiment.
- class abacus.auto_ab.HypothesisParams(alpha: float | None = 0.05, beta: float | None = 0.2, alternative: str | None = 'two-sided', metric_type: str | None = 'continuous', metric_name: str | None = 'mean', metric: ~typing.Callable[[~typing.List[float] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series], float] | None = <function mean>, metric_transform: ~typing.Callable[[~numpy.ndarray], ~numpy.ndarray] | None = <function mean>, metric_transform_info: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'dict'>, extra={}), filter_method: str | None = 'top_5', n_boot_samples: int | None = 200, n_buckets: int | None = 100, strata: str | None = '', strata_weights: ~typing.Dict[str, float] | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'dict'>, extra={}))[source]
Description of hypothesis parameters.
- Parameters:
alpha (float) – type I error.
beta (float) – type II error.
alternative (str) – directionality of hypothesis: less, greater, two-sided.
metric_type (str) – metric type: continuous, binary, ratio.
metric_name (str) – metric name: mean, median. If custom metric, then use here appropriate name.
metric (Callable[[Iterable[float]], np.ndarray], Optional) – if metric_name is custom, then you must define metric function.
metric_transform (Callable[[np.ndarray], np.ndarray], Optional) – applied transformations to experiment.
metric_transform_info (Dict[str, Dict[str, Any]], Optional) – information of applied transformations.
filter_method (str, Optional) – method for filtering outliers: top_5, isolation_forest.
n_boot_samples (int, Optional) – number of bootstrap iterations.
n_buckets (int, Optional) – number of buckets.
strata (str, Optional) – stratification column.
strata_weights (Dict[str, float], Optional) – historical strata weights.
Methods
metric([axis, dtype, out, keepdims, where])Compute the arithmetic mean along the specified axis.
metric_transform([axis, dtype, out, ...])Compute the arithmetic mean along the specified axis.
alpha_validator
alternative_validator
beta_validator
metric_type_validator
- class abacus.auto_ab.ABTestParams(data_params: 'DataParams' = FieldInfo(default=DataParams(id_col='id', group_col='groups', control_name='A', treatment_name='B', is_grouped=True, strata_col='', target='', numerator='', denominator='', covariate='', target_prev='', predictors_now=[], predictors_prev=[], control=[], treatment=[], transforms=[]), extra={}), hypothesis_params: 'HypothesisParams' = FieldInfo(default=HypothesisParams(alpha=0.05, beta=0.2, alternative='two-sided', metric_type='continuous', metric_name='mean', metric=<function mean at 0x7f023968a520>, metric_transform=<function mean at 0x7f023968a520>, metric_transform_info={}, filter_method='top_5', n_boot_samples=200, n_buckets=100, strata='', strata_weights={}), extra={}))[source]