Bias and Sequencing killed A/B testing. Long live intelligent testing!
Chances are: your A/B testing campaign will fail. Fail to drive conversions, fail to drive significant and actionable results.
This doesn’t mean experimentation shouldn't happen; quite the contrary, let's double down on testing but let’s do it smartly, with actual goals in mind that are not simple validation of our own assumptions and the understanding that it’s beyond human reach to analyze and optimize data exhaustively.
I believe the main reason A/B testing hasn’t lived up to its promise is that we’ve oversimplified it for years. We run and compare 2 (or even X) copies and see what *performs* the best. Then we’ll decide what to do with it…
In my opinion, we can summarize the challenges in 2 categories: the sequencing of activities and the bias we inject at every step.
The sequential issue
Fundamentally, A/B testing is set up with 3 main phases:
Test -> Analyze -> Optimize
3 phases; each of them requiring more resources than anticipated. And many campaigns are needed to assess the performance of content. This is prohibitive from testing at scale.
Between these phases, human decision making…= bias injected
When do we stop testing and start analyzing? The common answer is: “When statistical significance is reached”, which too often translates to “when I reach XXX visitors”. Lots of assumptions here. Did we look at external factors, did our population change in the last X months and should we assume the breakdown of sensitivities and intent of our visitors remained constant (maybe uniform) during this phase?
What learning do we apply here? Did we actually learn something new or did we just validate our own strong biased assumptions?
The multi-touch challenge
A site, and content in general, is made of many elements working together, each appealing to various sensitivities, objectives, and intent of visitors. Designating one winner after testing 2 versions of 1 element is not going to work.
Maybe a specific H1 Headline is performing better on one test but what if the other headline was coupled with a different image and a few variations of CTA…do we still know if it’s a “bad performer”?
The segmentation over-simplification
The control group vs the rest of the world…We have a clear winner. Awesome!
Now, on this chart showing a higher conversion rate, is it uniform? Unlikely. Is one subgroup of visitors responding so well to your copy that it offsets another set of users responding very poorly to your messaging (and all the consequences associated)?
Again, pretty hard to make a decision here without deeper analysis.
The decision to display a variant to a specific user should be made dynamically, based on historical data and propensity to convert.
The solution exists and is available
Stop thinking in terms of testing THEN optimizing.
We need to move to adaptive optimization: Machine Learning and smart algorithms combined can solve for this, optimize while testing against smart and dynamic, intent-based segmentation.
What you get: reduced cost of experiments, accelerated results, actionable, granular teachings, not altered by any bias.
Gwendal
Learn more here