top of page
Search
  • Writer's pictureYuchen Liu

Everything You Need to Know about A/B Tests

Updated: May 4, 2020

I always remember my professor at NYU Stern says: thinking of a way to make your data meaningful. Understanding the business scenario and utilizing data is crucial for delivering an effective strategy and optimizing ROI. If you are working in the digital ecosystem, A/B test is a good choice to find which design can attract more visitors to convert and help you earn more money $$$!


🤪 Spoiler alert: If you are new to A/B test, please make sure you have additional two hours and a huge cup of coffee on table!


A/B test is not as simple as you think, there’re many complicated methods hidden behind the experiment. There’s no shortcut being an expert, you’ll see the most useful online source I carefully selected that helps to learn about A/B test little by little. In this blog, I'll cover some topics of the test with one control group and one variation. Testing multiple variations is more complicated and with higher risks, I should create a separate blog to discuss it later.


If you Google “why A/B testing”, there will be about 601,000,000 results showing within one second. Marketers deeply understand how difficult to catch the eye of picky customers, so they try to find a better design to boost the conversion rate. That is simply the reason that we need to A/B test the webpages.


I visualize the flow of an A/B test for your reference, please do let me know your opinions in the comment section. Designing a thorough testing plan is important. let's do it together!

 

Before the Experiment


Step 1: Understand User Behaviors


You won’t start a new experiment until you fully understand the user behaviors of the webpages. My suggestion is to create a heatmap using some online software (such as Crazyegg and Hotjar) to show how your visitors interact with the pages; then you can concentrate on the most intensive sections to determine the most important element to test.


Here's an example of the heatmap for a Wikipedia page:


Step 2: Select Pages/Elements to Test and Create Variables


a. The element you used for testing can be:

  • Offers – coupon code, link to free webinars, handbook, free trail, etc.

  • Copy – different versions of the content

  • Image – color, background, text, etc.

  • Call-to-action – the most common part for marketers to test, because they're generating leads or sales. For example, you may test different texts, sizes or colors of a button to see if one of the designs is better than the other.

b. Test different landing pages:

A/B test is not limited to the elements, it also can be two completely different pages, which is called Split URL Test.


Step 3: Set a Goal


There must be a goal you want to achieve after the test; it could be a conversion uplift or increased leads. The goal helps you state your hypothesis and determines the Minimum Detectable Effect (MDE) and your sample size. We will go over these two terms later in Step 4.


Step 4: Some Statistics Work Before Testing

a. Determine how significant your result should be.

Because the significance level (90%, 95% or 99%) determines how confident you can see the result is statistically significant.


b. State your hypothesis

we need to state a null hypothesis to be rejected or accepted later in analyzing the result. For example, your null hypothesis could be: “by changing the color of 'learn more' button from red to green can generate more leads”. The alternative hypothesis will be to challenge the null hypothesis, which you hope the result can prove the alternative one to be true.


🤪Yuchen’s tip: Please refer this blog to see more details about basic statistics.


c. Determine your sample size

Sample size matters in A/B tests, because you need to know how much traffic you should acquire to make sure you don’t get an underpowered result. If you’re interested in exploring the math and science behind the sample size calculation, you may need some time to read this article.


A good calculator can help you save time, however, it’s important to understand the meaning of each parameter first. The Sample size calculator from Optimizely looks like this:

🤪Yuchen’s explanation:

Baseline Conversion Rate – the conversion rate of your control group. If you’re testing a button, it’s going to be the rate of people click on your current version.


Minimum Detectable Effect (MDE) – this estimates the smallest improvement you’re going to be able to detect in your experiment. You can get an estimated MDE to be used in the calculator, the formula look like this:


MDE = desired conversion rate lift / baseline conversion rate x 100%


MDE determines your sample size and depends on the risk - costs. I found a good article explains how to calculate your traffic acquisition costs and potential revenue. You may need some math again here. Now you'll have an estimated number of how much you'll pay for the test, and how likely your money can be back from the conversions during the experiment. Understanding the costs and benefits can help testers to consider whether engaging in A/B tests or not.


Statistical Significance – If you’re still straggling basic statistics, I recommend you to read this article.


Step 5: Determine the Duration of Your Test


Theoretically, the period of running an AB test is based on the sample size, in this article, the author gives us a formula:


Expected experiment duration = samples size/number of visitors to the tested page


However, all our assumptions are based on the randomly selected visitors, the true practice in the real world is more complicated. So, the author recommends that “you run your test for a minimum of one to two week”.


I agree with him. Beyond the math, we can see that one week can completely reflect the user behaviors on workdays and weekends, and the result of the two-week experiment is more reliable. When you're planing the tests, please make sure you don’t schedule them on special occasions, such as holidays.

 

During the Experiment


Step 5: Setup Your Experiment in Your Tool


There’re tons of great software for A/B testing, thanks to this blog from Convertize giving us an overview of features and comparisons of the top 24 tools in 2020.


There're two things to remember when you create a new experiment:

a. Split your sample groups equally and randomly – set the weight of your visitors for the two groups equally.

b. Make sure you're only running one test at a time – if you’re running a split test for an email campaign redirects to a landing page which is also running and A/B test, how would you know which contributes to the lead generation?

 

After the Experiment


Step 6 Post-Experiment Analysis


Some great tools to help us avoid complicated math works, Hubspot and Kissmetrics provide a very useful tool to help you analyze the final result. Simply type your basic information, then you’ll have a guide, planner and free significance calculator!


The calculator looks like this:


🤪Yuchen’s explanation :


  • Step 1 shows the raw data from the tool you used for the A/B test.

  • Step 2 calculates the conversion rate of each variation.

  • Step 3 gives you the confidence intervals of different confidence levels. For example, at 90% confidence level, between 15.18% and 16.90% visitors converted from version A; between 15.16% and 16.87% visitors converted from version B. The confidence interval reflects the uncertainty of the test, marketers can also visualize the result to see the risk.

If you’re interested in exploring more about confidence interval, click here to find more information.

  • Step 4 computes the Z-score and P-value. P-value equals to 0.51 which is above 0.05 (confidence level), so we cannot reject the null hypothesis that there’s no difference between the two variations.


In terms of significance level, Z-score and P-value, I recommend you to read my previous blog where you can find an example to retrieve your memory. The blog covers an example of hypothesis testing and important parameters you can see in the calculator.


Here we have another great example from Hubsopt’s blog to demonstrate how to understand the statistical significance of an A/B test.


Type I, II errors:

Type I error occurs when there’s no enough evidence (data) in a high confidence level to reject the null hypothesis but you still reject the null hypothesis incorrectly. Type II error occurs when there is a winner but you didn’t reject the null hypothesis and say there’s no difference between the two groups.


Some other stuffs you may be interested in:

This article specifies how to avoid these two types of errors. In summary, you need to make sure that the data is collected sufficiently and the samples are randomly selected.


I also found a blog discusses the differences between one-tailed or two-tailed Testing

which is not a purely statistical demonstration but covers some great use case of business perspective. From which, you’ll have a better understanding of:


• Pros and cons of one-tailed and two- tailed test

• How do we choose between the two types of testing

• Differences between the two types of testing in business scenarios

• How to increase the power of A/B tests

 

Summary

Planing an A/B test is not easy, especially for different business situations. Marketers should select the pages and elements, make an assumption, estimate costs, collect data and analyze the result. Please calm down and think about the goal of the test, then track the whole process of your experiment thoroughly. Maybe you need a statistician chime in here!


🤪Yuchen's bonus time - Download my A/B Test Tracking Sheet to better organize your experiments!


Happy analyzing!

55 views1 comment

Recent Posts

See All

1 Comment


haoke2017
Feb 24, 2020

Great post!

Like
bottom of page