Yuchen Liu
- Feb 4, 2020
- 4 min read

Life is Harder than Math! Kick off Your First Hypothesis Testing

Updated: May 4, 2020

Welcome to the first Monday of February 2020! I hope you enjoyed your Super Bowl night with Chiefs and 49ers.

February is busy, we have some good news and bad news. Anyway, new semester is around the corner, horrible math classes have been scheduled on your curriculum! No worries, in this blog, you will start off your first statistical analysis step by step and see how we use R language to simplify the process!

Scenario:

Researchers suspect that the new drug can be more effective to cure the virus ‘X’. A study from January showed 131 patients were cured with the old drug in 400 randomly selected patients, a separate study from February showed 229 patients were saved by the new drug in 600 randomly selected patients.

What we are going to do is to use hypothesis testing to see if we can prove that the new drug is more effective.

Let’s get started!

Step 1 - Setting Null and Alternative Hypothesis

Researchers want to know if the new drug works for more patients, so that we can look at the proportion of cured patients in Feb and compare that to the proportion in Jan. Our Null hypothesis will be “there’s no difference between the two drugs”.

Therefore, our alternative hypothesis is that the new drug is more effective than the old one (because researchers address the new drug can cure more patients).

Step 2 - Checking the Conditions for Inference

“Random” – 400 and 600 people were RANDOMLY selected

“Normal” – 131 and 229 patients were successfully cured which are greater than 10

“Independence” – the sample size is less than 10% of the population because there are more than 6000 people infected by the virus ‘X’.

Step 3 - Set Your Significance Level

Usually we set our significance level α to 0.05 and see if the probability of the difference between Jan and Feb is less than 0.05, then we would reject the Null hypothesis and accept the alternative one.

🤪Yuchen’s tip: Set the significance level ahead of time, and you must avoid trying to raise it later to reject the Null hypothesis. The result is objective, you cannot make it up!

Step 4 - Computing Z-Score

Your favorite part – Formula!

Let me explain it a little bit:

– sample proportion of February

– sample proportion of January

– Standard deviation of the sampling distribution of the difference between the two sample proportions.

We’re going to use the combined proportion (p̂c) under the radical sign as the denominator because we assume our Null hypothesis is true which means there’s no difference between the two samples.

Therefore:

The z-score of 1.74 means that the difference between the proportion between Feb and Jan (p̂Feb - p̂Jan = 0.054) is 1.74 standard deviation above the mean of the sampling distribution (estimated by the combined proportion).

Step 5 - Calculating P-Value

We can see the image above is the normal distribution which shows the cumulative area (in grey) up to a Z-score. We’re looking for 1.74 standard deviation above the mean and then the area in red is going to be the p-value. Let's get out a z table:

p-value = p (z≥1.74) = 1 – 0.9591 = 0.0409 < 0.05

The p-value indicates the probability that the difference between the proportion of Feb and Jan is 1.74 standard deviation above the mean.

Step 6 – Draw a Conclusion

Great! Our p-value is slightly less than the significance level α = 0.05, then we can reject the Null hypothesis and say we have evidence to suggest that the new drug is more effective at 95% confidence level!

All right, what we just did is exactly what I learnt in the math class. Want to find a new way to make your life easy? Here comes with a modern solution – R !

R script:

Check here for the document of prop.test function in R.

prop.test(c(229,131),c(600,400),alternative = c("greater"), + conf.level = 0.95, correct = TRUE)

TADA! Here we have the p-value with one more step by hitting the ‘Enter’ key!

You’ll see the p-value is slightly different than we calculated on paper, I cannot figure out this problem temporarily, but I’ll get the answer by diving into the mechanism of the calculation in R at its back end. You’re also welcomed to leave a comment if you know the answer!

🤪Yuchen’s tip: If you are new to R, please check my blog post here (coming soon…) to learn more!

Is Hypothesis Testing Useful to Our Business?

The answer is YES! Hypothesis testing can be used to analyze the result of A/B test to determine the winner of different variables on your web page. If you’re running an online business and want to see if there’re more conversions, please use this technique to explore your data!

P.S. The example used in this blog is NOT the real-world data from any scientific research. The purpose for me to set this scenario is to express the best wishes to people in China who’re fighting for the new Coronavirus. People don’t give up even there is tiny chance to survive!

Happy analyzing with Yuchen!

START SMALL

Life is Harder than Math! Kick off Your First Hypothesis Testing

Recent Posts

Kommentarer