Bias and Sequencing killed A/B testing. Long live intelligent testing!

Gwendal Mahe
3 min readJul 13, 2020

Chances are: your A/B testing campaign will fail. Fail to drive conversions, fail to drive significant and actionable results.

This doesn’t mean experimentation shouldn't happen; quite the contrary, let's double down on testing but let’s do it smartly, with actual goals in mind that are not simple validation of our own assumptions and the understanding that it’s beyond human reach to analyze and optimize data exhaustively.

I believe the main reason A/B testing hasn’t lived up to its promise is that we’ve oversimplified it for years. We run and compare 2 (or even X) copies and see what *performs* the best. Then we’ll decide what to do with it…

In my opinion, we can summarize the challenges in 2 categories: the sequencing of activities and the bias we inject at every step.

The sequential issue

Fundamentally, A/B testing is set up with 3 main phases:

Test -> Analyze -> Optimize

3 phases; each of them requiring more resources than anticipated. And many campaigns are needed to assess the performance of content. This is prohibitive from testing at scale.

Between these phases, human decision making…= bias injected

When do we stop testing and start analyzing? The common answer is: “When statistical significance is reached”, which too often translates to “when I reach XXX visitors”. Lots of assumptions here. Did we look at external factors, did our population change in the last X months and should we assume the breakdown of sensitivities and intent of our visitors remained constant (maybe uniform) during this phase?

What learning do we apply here? Did we actually learn something new or did we just validate our own strong biased assumptions?

The multi-touch challenge

A site, and content in general, is made of many elements working together, each appealing to various sensitivities, objectives, and intent of visitors. Designating one winner after testing 2 versions of 1 element is not going to work.

Maybe a specific H1 Headline is performing better on one test but what if the other headline was coupled with a different image and a few variations of CTA…do we still know if it’s a “bad performer”?

Permutation of content is key

The segmentation over-simplification

The control group vs the rest of the world…We have a clear winner. Awesome!

Now, on this chart showing a higher conversion rate, is it uniform? Unlikely. Is one subgroup of visitors responding so well to your copy that it offsets another set of users responding very poorly to your messaging (and all the consequences associated)?

Again, pretty hard to make a decision here without deeper analysis.

The decision to display a variant to a specific user should be made dynamically, based on historical data and propensity to convert.

Impressions of variants should be dynamic and change daily, as are your visitors

The solution exists and is available

Stop thinking in terms of testing THEN optimizing.
We need to move to adaptive optimization: Machine Learning and smart algorithms combined can solve for this, optimize while testing against smart and dynamic, intent-based segmentation.

What you get: reduced cost of experiments, accelerated results, actionable, granular teachings, not altered by any bias.

Gwendal

Learn more here

--

--

Gwendal Mahe
0 Followers

Building Cauzal.ai. We predict the content that will convert your audiences