Experiment Initialization

Before actual analysis, you have to define your experiment. Here is how you can do it:

from abacus.auto_ab.abtest import ABTest
from abacus.auto_ab.params import ABTestParams, DataParams, HypothesisParams

df = pd.read_csv('./data/ab_data.csv')

data_params = DataParams(
    id_col='user_id',
    group_col='groups',
    control_name='control',
    treatment_name='treatment',
    target='check_rub_campaign',
)

hypothesis_params = HypothesisParams(
    alpha=0.01,
    beta=0.2,
    alternative='greater',
    metric_type='continuous',
    metric_name='95th quantile',
    metric=lambda x: np.quantile(x, 0.95)
)

ab_params = ABTestParams(data_params, hypothesis_params)
ab_test = ABTest(df, ab_params)

As you can see, you just need to describe data and your hypothesis.

For data, you have to define columns and their purposes. Required attributes are:

  • id_col is observation id. It can be user_id or any other id for your rows. Note that if your observations are somehow dependent (e.g. several checks per user), they must have the same id_col.

  • group_col contains group names. If your data have two groups, then there mush be only two unique values in this column.

  • control_name and treatment_name are group names e.g. ‘control’, ‘treatment’, ‘A’, ‘B’, ‘control group’, ‘send sms’, ‘do not send sms’, etc.

  • target is obviously target column containing metric of interest.

Hypothesis is described with:

  • alpha — type I error.

  • beta — type II error.

  • alternative — alternative of hypothesis (two-sided, less, or greater.

  • metric_type — metric type. There are three of them: continuous, binary, and ratio.

  • metric_name — metric name, either default (‘mean’ or ‘median’) or customer (e.g. ‘95th percentile’).

  • metric — function for metric calculation if metric_name is not default.