A/B-Testing of Predictive Models – Part 1

Why Should You Test At All?

When companies work with the CrossEngage Customer Prediction Platform (CPP), they naturally ask themselves what effect the predictive models created in the platform have on their marketing KPIs and on individual campaigns. This question can be answered precisely with the help of various test scenarios. Only proper test designs make it possible to check the extent to which an optimization measure achieves the desired positive effect.

One way to evaluate optimization efforts (and probably the most commonly used) is A/B testing. In this blog post, we’ll introduce you to why it makes sense to test and what it takes to set up a clean A/B test for your marketing campaigns.

Testing helps to make the right decisions and to gain certainty in decision-making. For example, whenever a new marketing measure is introduced or an existing measure is changed (different customer selection types, a different advertising medium, etc.), this adjustment should be validated by proper testing. This is the only way to make truly statistically relevant and measurable statements about changes you have made.

How Does an A/B Test Work?

An A/B test is a test method that makes it possible to determine the effectiveness of a change. The basic procedure is always the same for the different types of A/B tests:

The total customer potential (for example all customers eligible for a marketing campaign) is randomly divided into two groups. These groups differ in one key thing. That can be a process to be tested (test group) vs. the previous process (control group). In the case of the CPP, one would compare the customer selection of a campaign via the previous procedure with the selection using predictive models from the CPP.

Gold Standard

Basically, there are two principles in implementing A/B-Tests: Control and randomization (“gold standard“):

Control means that a test group is compared with a control group. Only in this way, it is possible to answer whether the results of a new and a previous measure differ.
The random allocation (randomization) ensures that the groups are statistically equal and that a fair comparison is possible. If the grouping were not randomized (for example, by gender, even vs. odd customer numbers, north vs. south), the results would not be representative because the test groups would inevitably differ, at least in this characteristic. Effects found would not be exclusively attributable to the difference in measures.

Other Important Notes

A 50/50 split of the total potential is not necessarily required; other split keys such as 70/30 or 80/20 can be chosen as well. The only condition is that the smaller group is sufficiently large to produce statistically significant results (this is explained in more detail below).
The comparison of the two groups can be done using different KPIs such as revenue per contact, conversion rate, shopping cart sizes, or uplift.
Results obtained on the basis of a single test only depict the short-term effect of a change. For decisions that represent a significant change to the marketing program, we recommend a longer test period to assess the medium-term impact.

The Decision For a Concrete Test Design

To describe the different test designs in this blog post, we will assume the following scenario, which we often deal with at CrossEngage:

CrossEngage users have so far selected their customers for their marketing campaigns according to RFM criteria (Recency, Frequency, and Monetary Value), which they have developed themselves or in cooperation with a service provider. Alternatively, CrossEngage users already use predictive models, either developed by themselves or created manually with the help of a service provider. Now we want to compare the difference between customer selections using predictive models created with the CPP and the previous procedure (benchmark).

The first step is to select the appropriate test design. This depends on both the use case and the KPIs that you would like to test. The following variations basically correspond to the design presented in Figure 1. Depending on the use case, however, a different number of levels of control may be required.

Simple A/B Test:
Comparison of Two Customer Selections For the Same Customer Group

If customer selection processes are based on the same customer group (for example, active or reactivation customers), the following test design can be used and adapted to possible individual needs:

The customer base is randomly divided into two customer potentials according to the distribution key. For one part, the standard selection procedure is used. For the other part, customer selection is made using predictive models from the CrossEngage CPP. The top customers of both selections are then included in the total circulation according to the distribution key (as previously defined according to 50/50, 70/30, etc.).

Advantages of simple A/B test:

Very simple test procedure to evaluate an optimization measure.

Disadvantages of simple A/B test:

Not suitable for complicated use cases.

Overlap Test :
Comparison Of Two Selections For the Same Customer Group With an Overlap

If customers selected by the standard procedure should be reliably included in the total circulation, an overlap test comes into play. The difference to the simple A/B test is primarily that the division into two separate random customer parts is omitted. The previous selection procedure (“standard procedure“) and the selection with CrossEngage are both performed on the total customer base.

The resulting overlap will certainly flow into the total circulation. The remaining free places in the total circulation are then filled again according to a distribution key with the top customers of both customer selection methods. For the success evaluation, the disjoint sets (contacts that are not in the overlap) are compared.

Advantages of the overlap test:

Already when setting up the test it is obvious how different both customer selections are.
Lower risk – because contacts rated as “good customers” by the established procedure are included in the total circulation in any case.

Disadvantages of the overlap test:

The intersection must be removed again for evaluation. Depending on the size, this affects the statistical significance of the comparison.

There’s More to Come!

In this blog post, you’ve already learned what A/B testing is good for and the concept behind it. You also learned about two simple test designs.

Want to learn more? Read in our second part of this blog article how other test designs look like. You’ll also get concrete implementation tips for execution and evaluation along the way, summarized in a handy checklist. Happy reading!

Read the Second Part

A/B-Testing of Predictive Models – Guideline & Best Practices | Part 1

Why Should You Test At All?

How Does an A/B Test Work?

Gold Standard

Other Important Notes

The Decision For a Concrete Test Design

Simple A/B Test:
Comparison of Two Customer Selections For the Same Customer Group

Overlap Test :
Comparison Of Two Selections For the Same Customer Group With an Overlap

There’s More to Come!

Categories

Recent Posts

European Union

A/B-Testing of Predictive Models – Guideline & Best Practices | Part 1

Why Should You Test At All?

How Does an A/B Test Work?

Gold Standard

Other Important Notes

The Decision For a Concrete Test Design

Simple A/B Test:Comparison of Two Customer Selections For the Same Customer Group

Overlap Test :Comparison Of Two Selections For the Same Customer Group With an Overlap

There’s More to Come!

Categories

Recent Posts

Simple A/B Test:
Comparison of Two Customer Selections For the Same Customer Group

Overlap Test :
Comparison Of Two Selections For the Same Customer Group With an Overlap