Auto A/B

ABTest

class abacus.auto_ab.ABTest(dataset: DataFrame | None, params: ABTestParams)[source]

Performs different calculations of A/B-test.

  • Results evaluation for different metric types (continuous, binary, ratio).

  • Bucketing (decrease number of points, normal distribution of metric of interest)

Example:

from abacus.auto_ab.abtest import ABTest
from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams

data_params = DataParams(...)
hypothesis_params = HypothesisParams(...)
ab_params = ABTestParams(data_params, hypothesis_params)

df = pd.read_csv('data.csv')
ab_test = ABTest(df, ab_params)
ab_test.test_welch()
# {'stat': 5.172, 'p-value': 0.312, 'result': 0}
Attributes:
dataset

Methods

bucketing()

Performs bucketing in order to accelerate results computation.

cupac()

Performs CUPAC for variance reduction.

cuped()

Performs CUPED for variance reduction.

linearization()

Creates linearized continuous metric based on ratio-metric.

plot([kind, save_path])

Plot experiment.

resplit_df()

Resplit dataframe.

test_boot_confint()

Performs bootstrap confidence interval and zero statistical significance.

test_boot_fp()

Performs bootstrap hypothesis testing by calculation of false positives.

test_boot_ratio()

Performs bootstrap for ratio-metric.

test_boot_welch()

Performs Welch's t-test for independent samples with unequal number of observations and variance.

test_buckets()

Performs buckets hypothesis testing.

test_chisquare()

Performs Chi-Square test.

test_delta_ratio()

Delta method with bias correction for ratios.

test_mannwhitney()

Performs Mann-Whitney U test.

test_taylor_ratio()

Calculate expectation and variance of ratio for each group and then use t-test for hypothesis testing.

test_welch()

Performs Welch's t-test.

test_z_proportions()

Performs z-test for proportions.

filter_outliers

metric_transform

report

ABTest

Performs different calculations of A/B-test.

ABTest.bucketing

Performs bucketing in order to accelerate results computation.

ABTest.cupac

Performs CUPAC for variance reduction.

ABTest.cuped

Performs CUPED for variance reduction.

ABTest.linearization

Creates linearized continuous metric based on ratio-metric.

ABTest.plot

Plot experiment.

ABTest.resplit_df

Resplit dataframe.

ABTest.test_boot_confint

Performs bootstrap confidence interval and zero statistical significance.

ABTest.test_boot_fp

Performs bootstrap hypothesis testing by calculation of false positives.

ABTest.test_boot_ratio

Performs bootstrap for ratio-metric.

ABTest.test_boot_welch

Performs Welch's t-test for independent samples with unequal number of observations and variance.

ABTest.test_buckets

Performs buckets hypothesis testing.

ABTest.test_chisquare

Performs Chi-Square test.

ABTest.test_delta_ratio

Delta method with bias correction for ratios.

ABTest.test_mannwhitney

Performs Mann-Whitney U test.

ABTest.test_taylor_ratio

Calculate expectation and variance of ratio for each group and then use t-test for hypothesis testing.

ABTest.test_welch

Performs Welch's t-test.

ABTest.test_z_proportions

Performs z-test for proportions.

abacus.auto_ab.ABTest.__bucketize(self, x: List[float] | ndarray[Any, dtype[ScalarType]] | Series) ndarray

Split array x into N non-overlapping buckets.

There are two purposes for these actions:

  1. Decrease number of data points of experiment.

  2. Get normal distribution of a metric of interest.

Procedure:

  1. Shuffle elements of an array.

  2. Split points into N non-overlapping buckets.

  3. On every bucket calculate metric of interest.

Parameters:

x (np.ndarray) – Array to split.

Returns:

Splitted into buckets array.

Return type:

np.ndarray

abacus.auto_ab.ABTest.__check_required_columns(self, df: DataFrame, method: str) None

Check presence of columns in dataframe.

Parameters:
  • df (pandas.DataFrame) – DataFrame to check.

  • method (str) – Stage of A/B process which you’d like to test.

Raises:
  • ValueError – If is_valid_col is False. Experiment cannot be provided

  • if required columns are absent.

abacus.auto_ab.ABTest.__get_group(self, group_label: str, df: DataFrame | None = None) ndarray

Gets target metric column based on desired group label.

Parameters:
  • group_label (str) – Group label, e.g. ‘A’, ‘B’.

  • df (DataFrameType, optional) – DataFrame to query from.

Returns:

Target column for a desired group.

Return type:

numpy.ndarray

abacus.auto_ab.ABTest.__delta_params(self, x: DataFrame) Tuple[float, float]

Calculated expectation and variance for ratio metric using delta approximation.

Source: https://arxiv.org/pdf/1803.06336.pdf.

Parameters:

x (pandas.DataFrame) – Pandas DataFrame of particular group (A, B, etc).

Returns:

Mean and variance of ratio metric.

Return type:

Tuple[float, float]

abacus.auto_ab.ABTest.__manual_ttest(self, ctrl_mean: float, ctrl_var: float, ctrl_size: int, treat_mean: float, treat_var: float, treat_size: int) Dict[str, int | float]

Performs Welch’s t-test based on aggregation of metrics instead of datasets.

For empirical calculation of T-statistic we need: expectation, variance, array size for each group.

Parameters:
  • ctrl_mean (float) – Mean of control group.

  • ctrl_var (float) – Variance of control group.

  • ctrl_size (int) – Size of control group.

  • treat_mean (float) – Mean of treatment group.

  • treat_var (float) – Variance of treatment group.

  • treat_size (int) – Size of treatment group.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.__taylor_params(self, x: DataFrame) Tuple[float, float]

Calculated expectation and variance for ratio metric using Taylor expansion approximation.

Source: https://www.stat.cmu.edu/~hseltman/files/ratio.pdf.

Parameters:

x (pandas.DataFrame) – Pandas DataFrame of particular group (A, B, etc).

Returns:

Mean and variance of ratio metric.

Return type:

Tuple[float, float]

abacus.auto_ab.ABTest.bucketing(self) ABTest

Performs bucketing in order to accelerate results computation.

Returns:

New instance of ABTest class with modified control and treatment.

Return type:

ABTest

abacus.auto_ab.ABTest.cupac(self) ABTest

Performs CUPAC for variance reduction.

Returns:

New instance of ABTest class with modified control and treatment.

Return type:

ABTest

abacus.auto_ab.ABTest.cuped(self) ABTest

Performs CUPED for variance reduction.

Returns:

New instance of ABTest class with modified control and treatment.

Return type:

ABTest

abacus.auto_ab.ABTest.linearization(self) ABTest

Creates linearized continuous metric based on ratio-metric. Important: there is an assumption that all data is already grouped by user s.t. numerator for user = sum of numerators for user for different time periods and denominator for user = sum of denominators for user for different time periods

Source: https://research.yandex.com/publications/148.

abacus.auto_ab.ABTest.plot(self, kind: str = 'experiment', save_path: str | None = None) None

Plot experiment.

Parameters:
  • kind (str) – Kind of plot: ‘experiment’, ‘bootstrap’.

  • save_path (str, optional) – Path where to save image.

Raises:

ValueError – If kind is not in [‘experiment’, ‘bootstrap’].

abacus.auto_ab.ABTest.resplit_df(self) ABTest

Resplit dataframe.

Returns:

Instance of ABTest class with modified control and treatment.

Return type:

ABTest

abacus.auto_ab.ABTest.test_boot_confint(self) Dict[str, int | float]

Performs bootstrap confidence interval and zero statistical significance.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_boot_fp(self) Dict[str, int | float]

Performs bootstrap hypothesis testing by calculation of false positives.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_boot_ratio(self) Dict[str, int | float]

Performs bootstrap for ratio-metric.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_boot_welch(self) Dict[str, int | float]

Performs Welch’s t-test for independent samples with unequal number of observations and variance.

Welch’s t-test is used as a wider approaches with fewer restrictions on samples size as in Student’s t-test.

Statistic of the test:

\[t = \frac{\hat{X}_1 - \hat{X}_2}{\sqrt{\frac{s_1}{\sqrt{N_1}} + \frac{s_2}{\sqrt{N_2}}}}.\]
Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_buckets(self) Dict[str, int | float]

Performs buckets hypothesis testing.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_chisquare(self) Dict[str, int | float]

Performs Chi-Square test.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_delta_ratio(self) Dict[str, int | float]

Delta method with bias correction for ratios.

Source: https://arxiv.org/pdf/1803.06336.pdf.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_mannwhitney(self) Dict[str, int | float]

Performs Mann-Whitney U test.

Test works on continues metrics and their ranks.

Assumptions of Mann-Whitney test:

  1. Independence of observations.

  2. Same shape of metric distributions.

Statistic of the test:

\[U = \sum_{i=1}^{n} \sum_{j=1}^{m} S(X_i, Y_j).\]
Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_taylor_ratio(self) Dict[str, int | float]

Calculate expectation and variance of ratio for each group and then use t-test for hypothesis testing.

Source: http://www.stat.cmu.edu/~hseltman/files/ratio.pdf.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_welch(self) Dict[str, int | float]

Performs Welch’s t-test.

Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing

abacus.auto_ab.ABTest.test_z_proportions(self) Dict[str, int | float]

Performs z-test for proportions.

The two-proportions z-test is used to compare two observed proportions.

Statistic of the test:

\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}.\]
Returns:

Dictionary with following properties: test statistic, p-value, test result. Test result: 1 - significant different, 0 - insignificant difference.

Return type:

stat_test_typing


VarianceReduction

class abacus.auto_ab.VarianceReduction[source]

Implementation of sensitivity increasing approaches.

As it is easier to apply variance reduction techniques directly to experiment, all approaches should be called on ABTest class instance.

Example:

from abacus.auto_ab.abtest import ABTest
from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams

data_params = DataParams(...)
hypothesis_params = HypothesisParams(...)
ab_params = ABTestParams(data_params, hypothesis_params)

df = pd.read_csv('data.csv')
ab_test = ABTest(df, ab_params)
ab_test = ab_test.cuped()

Methods

cupac(x, target_prev_col, target_now_col, ...)

Perform CUPED on target variable with covariate calculated as a prediction from a linear regression model.

cuped(df, target_col, groups_col, covariate_col)

Perform CUPED on target variable with predefined covariate.

abacus.auto_ab.VarianceReduction._target_encoding(x: DataFrame, encoding_columns: List[str] | ndarray[Any, dtype[ScalarType]] | Series, target_column: str) DataFrame

Encodes target column.

abacus.auto_ab.VarianceReduction._predict_target(x: DataFrame, target_prev_col: str, factors_prev_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series, factors_now_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series) List[float] | ndarray[Any, dtype[ScalarType]] | Series

Covariate prediction with linear regression model.

Parameters:
  • x (pandas.DataFrame) – Pandas DataFrame.

  • target_prev_col (str) – Target on previous period column name.

  • factors_prev_cols (List[str]) – Factor columns for modelling.

  • factors_now_cols (List[str]) – Factor columns for prediction on current period.

Returns:

Pandas Series with predicted values

Return type:

pandas.Series

abacus.auto_ab.VarianceReduction.cuped(df: DataFrame, target_col: str, groups_col: str, covariate_col: str) DataFrame

Perform CUPED on target variable with predefined covariate.

Covariate has to be chosen with regard to the following restrictions:

  1. Covariate is independent of an experiment.

  2. Covariate is highly correlated with target variable.

  3. Covariate is continuous variable.

Original paper: https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf.

Parameters:
  • df (pandas.DataFrame) – Pandas DataFrame for analysis.

  • target_col (str) – Target column name.

  • groups_col (str) – Groups A and B column name.

  • covariate_col (str) – Covariate column name. If None, then most correlated column in considered as covariate.

Returns:

Pandas DataFrame with additional target CUPEDed column

Return type:

pandas.DataFrame

abacus.auto_ab.VarianceReduction.cupac(x: DataFrame, target_prev_col: str, target_now_col: str, factors_prev_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series, factors_now_cols: List[str] | ndarray[Any, dtype[ScalarType]] | Series, groups_col: str) DataFrame

Perform CUPED on target variable with covariate calculated as a prediction from a linear regression model.

Original paper: https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/.

Parameters:
  • x (pandas.DataFrame) – Pandas DataFrame for analysis.

  • target_prev_col (str) – Target on previous period column name.

  • target_now_col (str) – Target on current period column name.

  • factors_prev_cols (List[str]) – Factor columns for modelling.

  • factors_now_cols (List[str]) – Factor columns for prediction on current period.

  • groups_col (str) – Groups column name.

Returns:

Pandas DataFrame with additional columns: target_pred and target_now_cuped

Return type:

pandas.DataFrame


Graphics

class abacus.auto_ab.Graphics[source]

Illustration of an experiment.

  • As it is easier to apply plotting directly to experiment, all methods should be called on ABTest class instance.

  • Experiment’s plot is based on metric type.

Example:

from abacus.auto_ab.abtest import ABTest
from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams

data_params = DataParams(...)
hypothesis_params = HypothesisParams(...)
ab_params = ABTestParams(data_params, hypothesis_params)

df = pd.read_csv('data.csv')
ab_test = ABTest(df, ab_params)
ab_test.plot()

Methods

plot_binary_experiment(params, save_path)

Plot experiment with binary outcome.

plot_bootstrap_confint(params, save_path)

Plot bootstrapped metric of an experiment with its confidence interval and zero value.

plot_continuous_experiment(params, save_path)

Plot distributions of continuous metric and actual experiment metric.

abacus.auto_ab.Graphics.plot_continuous_experiment(params: ABTestParams, save_path: str | None) None

Plot distributions of continuous metric and actual experiment metric.

Parameters:
  • params (ABTestParams) – Parameters of the experiment.

  • save_path (str, optional) – Path where to save image.

abacus.auto_ab.Graphics.plot_binary_experiment(params: ABTestParams, save_path: str | None) None

Plot experiment with binary outcome.

Parameters:
  • params (ABTestParams) – Parameters of the experiment.

  • save_path (str, optional) – Path where to save image.

abacus.auto_ab.Graphics.plot_bootstrap_confint(params: ABTestParams, save_path: str | None) None

Plot bootstrapped metric of an experiment with its confidence interval and zero value.

Parameters:
  • x (np.ndarray) – Bootstrap metric.

  • params (ABTestParams) – Parameters of the experiment.


Params

class abacus.auto_ab.DataParams(id_col: str = 'id', group_col: str = 'groups', control_name: str = 'A', treatment_name: str = 'B', is_grouped: bool | None = True, strata_col: str | None = '', target: str | None = '', numerator: str | None = '', denominator: str | None = '', covariate: str | None = '', target_prev: str | None = '', predictors_now: ~typing.List[str] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), predictors_prev: ~typing.List[str] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), control: ~typing.List[float] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), treatment: ~typing.List[float] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}), transforms: ~typing.List[str] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'list'>, extra={}))[source]

Data description as column names of dataset generated during experiment.

Parameters:
  • id_col (str) – ID of observations.

  • group_col (str) – Group of experiment.

  • control_name (str) – Name of control group in group_col.

  • treatment_name (str) – Name of treatment group in group_col.

  • is_grouped (bool, Optional) – Flag that shows whether observations are grouped.

  • strata_col (str, Optional) – Name of stratification column. Stratification column must be categorical.

  • target (str, Optional) – Target column name of continuous or binary metric.

  • numerator (str, Optional) – Numerator for ratio metric.

  • denominator (str, Optional) – Denominator for ratio metric.

  • covariate (str, Optional) – Covariate column for CUPED.

  • target_prev (str, Optional) – Target column name for previous period of continuous metric.

  • predictors_now (List[str], Optional) – List of columns to predict covariate.

  • predictors_prev (List[str], Optional) – List of columns to create linear model for covariate prediction.

  • control (ArrayNumType, Optional) – Control group data used for quick access and excluding querying dataset.

  • treatment (ArrayNumType, Optional) – Treatment group data used for quick access and excluding querying dataset.

  • transforms (ArrayStrType, Optional) – List of transformations applied to experiment.

class abacus.auto_ab.HypothesisParams(alpha: float | None = 0.05, beta: float | None = 0.2, alternative: str | None = 'two-sided', metric_type: str | None = 'continuous', metric_name: str | None = 'mean', metric: ~typing.Callable[[~typing.List[float] | ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy._typing._generic_alias.ScalarType]] | ~pandas.core.series.Series], float] | None = <function mean>, metric_transform: ~typing.Callable[[~numpy.ndarray], ~numpy.ndarray] | None = <function mean>, metric_transform_info: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'dict'>, extra={}), filter_method: str | None = 'top_5', n_boot_samples: int | None = 200, n_buckets: int | None = 100, strata: str | None = '', strata_weights: ~typing.Dict[str, float] | None = FieldInfo(default=PydanticUndefined, default_factory=<class 'dict'>, extra={}))[source]

Description of hypothesis parameters.

Parameters:
  • alpha (float) – type I error.

  • beta (float) – type II error.

  • alternative (str) – directionality of hypothesis: less, greater, two-sided.

  • metric_type (str) – metric type: continuous, binary, ratio.

  • metric_name (str) – metric name: mean, median. If custom metric, then use here appropriate name.

  • metric (Callable[[Iterable[float]], np.ndarray], Optional) – if metric_name is custom, then you must define metric function.

  • metric_transform (Callable[[np.ndarray], np.ndarray], Optional) – applied transformations to experiment.

  • metric_transform_info (Dict[str, Dict[str, Any]], Optional) – information of applied transformations.

  • filter_method (str, Optional) – method for filtering outliers: top_5, isolation_forest.

  • n_boot_samples (int, Optional) – number of bootstrap iterations.

  • n_buckets (int, Optional) – number of buckets.

  • strata (str, Optional) – stratification column.

  • strata_weights (Dict[str, float], Optional) – historical strata weights.

Methods

metric([axis, dtype, out, keepdims, where])

Compute the arithmetic mean along the specified axis.

metric_transform([axis, dtype, out, ...])

Compute the arithmetic mean along the specified axis.

alpha_validator

alternative_validator

beta_validator

metric_type_validator

class abacus.auto_ab.ABTestParams(data_params: 'DataParams' = FieldInfo(default=DataParams(id_col='id', group_col='groups', control_name='A', treatment_name='B', is_grouped=True, strata_col='', target='', numerator='', denominator='', covariate='', target_prev='', predictors_now=[], predictors_prev=[], control=[], treatment=[], transforms=[]), extra={}), hypothesis_params: 'HypothesisParams' = FieldInfo(default=HypothesisParams(alpha=0.05, beta=0.2, alternative='two-sided', metric_type='continuous', metric_name='mean', metric=<function mean at 0x7f06d0e5c4c0>, metric_transform=<function mean at 0x7f06d0e5c4c0>, metric_transform_info={}, filter_method='top_5', n_boot_samples=200, n_buckets=100, strata='', strata_weights={}), extra={}))[source]