The pitfalls of A/B testing
By Alaister Low on July 5th, 20121
A/B testing is a great way to optimize landing pages and increase conversions for your website. This is something that should be done continuously, however there are many pitfalls and limitations that marketers need to be aware of.
The biggest pitfall for marketers when running A/B tests is statistical significance. Take Tom for example, who played roulette and bet $1 on black 100 times. From those 100 bets he won 46 times and lost 54 times. Tom then decided to wear gloves and bet $1 on black another 100 times. This time Tom won 52 times and lost 48 times.
This experiment to see if wearing gloves increases your chances of winning is ridiculous, however this is how many marketers are currently running their A/B tests. In this example when Tom wore gloves, black won 7% more than when he wasn’t wearing gloves, so some marketers decide to “always wear gloves”. This problem of not understanding the difference between variations in conversions based on chance, as opposed to variations based on the changes to the page, can be very dangerous and costly for businesses.
In order for an A/B test to provide actionable data, the experiment must be run long enough that the null hypothesis is statistically disproven.
When starting any new A/B test we should always start the experiment assuming a null hypothesis. In an A/B testing environment the null hypothesis would be that “the differences between the two landing pages have no impact on conversions”. In other words any variation in conversions is based on pure chance. The purpose of running the A/B test is to disprove this null hypothesis. With the example above there were 200 bets made which would be the same as 200 website visits, and a 7% increase in conversions. For some sites this could result in huge increases in revenue, however that increase from the experiment is completely useless, and no decisions should be made based on that. The problem with the experiment is that the results have not hit a statistical relevance to suggest that either betting black with a glove or no glove is more effective from a statistical perspective. The solution to this problem of statistical relevance is to run the test longer with a larger sample size.
In order for an A/B test to provide actionable data, the experiment must be run long enough that the null hypothesis is statistically disproven. This could be a very short or very long time. The smaller the difference in conversions, the longer it could potentially take. The problem is that during the entire A/B test, marketers are sending 50% of traffic to a landing page that potentially converts less than the other page. If this was to be happen continuously, test after test, over time a lot of potential revenue could be lost. Thus, any gains made from discovering better converting landing pages could be canceled out.
It can be very tempting to end A/B tests prematurely when we start seeing better results and more conversions in one variation. Marketers and decision makers must restrain from ending the test early and continue split testing until the results are statistically significant.
Test big or small?
Now that we understand statistical significance and proper test duration, we can look at what to test. As mentioned above, the smaller the increase in conversion, the longer the tests. This usually aligns with how big or small the difference in variations are.
For example, it would take much longer to achieve statistical relevance when testing a small change such as changing the color of a conversion button from red to green as opposed to a big change such as a complete redesign.
One of the most common A/B testing rules and best practices is to test one thing at a time, such as the color of a button or a different headline. This is somewhat counter-intuitive, as running A/B tests with these small changes take way too long and the opportunity cost of that time could be better spent running a bigger test and achieving higher conversions quicker.
It’s better to start off with big changes and tests that are based on different conversion hypotheses. Once your big testing clearly defines a winner you can then make smaller tweaks and changes.
Confusion between A/B testing percentage increases
A very simple and common problem we have seen many marketers make is the confusion between percentage increases revealed in A/B tests. There are two ways to describe percentage increases in conversions, however they each paint a very different picture.
- 3% increase in conversions – This refers to the difference in the conversion rates of the two variations i.e. 30% and 33%
- 10% increase in conversion rate - This refers to the percentage increase from 30% to 33% conversion rate
When there is confusion between percentage increases of A/B tests, wrong decisions can be made. This is especially true if there are many different stakeholders and people involved with the testing.
An ineffective A/B test is a complete waste of time and can drive decisions down the wrong path. It’s very important to take note of these pitfalls to ensure you run your tests correctly. A/B testing is a statistically driven initiative so it’s important that you look and analyse the results from a statistical perspective.
We feel as though we have come up with a solution for many of the pitfalls of traditional A/B testing in our Growth Giant A/B testing tool. We employ the Multi-Armed Bandit algorithm, which in simple words, constantly pushes traffic to pages that are converting higher without having to wait for statistical significance. This eliminates wastage in our A/B tests, making ongoing testing more efficient. Traffic is automatically directed to better performing pages to ensure that you receive the maximum number conversions during your entire test period.
If you’re interested in a new and improved way to A/B test then sign up for the beta release of Growth Giant and get early access as soon as we launch!