**CXL Institute CRO Minidegree Review Part 8**

*This is part 8/12 in my series reviewing the CXL Institute CRO Minidegree. I will be posting a new part every week!*

CXL Institute offers some of the best online courses and industry-recognized certifications for those seeking to learn new technical marketing skills and tools highly useful to growth professionals, product managers, UX/UI experts, and any other marketing profile looking to become more customer-centric.

I was given an amazing opportunity to access and review one of their online course tracks, the Conversion Rate Optimization (CRO) Minidegree. For the next few weeks, I’ll be discussing the content of the course as well as what I think of it as I go through it. Here is part eight!

# Statistics Fundamentals

Understanding at least a basic level of statistics is key in finding mistakes and spotting false positives. The more you know about statistics, the more you will be able to avoid false information that can have negative implications for your growth. This week’s part will focus on the statistics basics which are important to know as an optimizer. The first short course on statistics fundamentals is taught by Ben Labay, a UX research scientist at CXL.

Here are the basics:

**Sampling:**The population is the entire pool of people being measured. A sample is a representative part of the pool. We use a sample in order to calculate the mean and standard deviation of the parameter that we’re interested in. Researching a large enough sample can yield insights about the entire true population.**Mean and Variance:**The mean is the typical value or midpoint in your data. The variance of the data is the difference in shape or spread. The standard deviation is the measure of how spread out the data is (it is the square root of the average distance from the mean).**Confidence Intervals:**This is the measure of the probability that a parameter value lies within a range of plausible values in the data. In order to calculate the confidence interval, you need the mean, the sample size, the variability, and the confidence level. In A/B testing, confidence intervals represent the reliability of estimates. It is important to understand this risk of sampling error and how much of it is acceptable during tests.**Statistical Significance and P-value:**Statistical significance is a measure of whether the effect in the results is likely due to chance. The P-value is the measure of the probability that you will see that same difference in another sample from the same population (if p < 0.05, there is less than a 5% chance that it is a false positive). P-value should not be confused with confidence levels, which should be set before testing.**Statistical Power:**The statistical power of tests is the probability of an effect is found when there is an effect to be found (or the probability that it will reject a false null hypothesis).**Sample Size:**The sample size will depend on the size of the effect you want to detect, the level of confidence, statistical power, and variability. You will also need to set the range of the conversion rate for the control.

Next, Labay also mentions common statistics “traps” and how to avoid them:

- Sampling errors can occur if a sample is taken from a non-representative part of the population (which is very different from the mean). A big enough sample, or running the test for longer will allow the results to get closer to the real mean (regression towards the mean). It is important to calculate the test duration in advance.
- Testing too many variants at the same time can impact the validity of tests, and can result in false positives. Many testing tools correct this issue, but those that don’t will require a separate process (such as ANOVA or a uni-factorial analysis). Labay suggests running a maximum of 3 variants next to the control in order to minimize the risk of this happening.
- Understand the difference between the Bayesian and Frequentist approaches to statistics. In A/B testing, the Frequentist approach is the most common, which means that predictions are based on underlying truths. In the Bayesian model, hypotheses are given probabilities based on a broader range of assumptions. It is argued this method is more representative of how people see the world, even though most testing does not use this philosophy.

# Statistics for A/B Testing

Georgi Georgiev, an applied statistician at WebFOCUS and the author of “*Statistical Methods in Online A/B Testing*”, leads this next in-depth course on statistics concepts that are important for A/B testing. Statistics is often improperly used in A/B testing because the results are taken at face value, without proper risk management in place. It’s highly important to understand the ever-present uncertainty that goes along with tests, as this information will have a strong impact on your decision-making. Making the wrong choices will end up being very costly.

Data is a proxy for reality. Before we can discern if we can make a change to a website, we need to understand if the change will hurt or improve our KPIs. We also need to be aware of the danger of misinterpreting our data. For example, let’s say you make a change to your website, and your bounce rate almost halves. You might assume this amazing effect is due to the change you made, but what if the tracking got messed up in the process? Correlation does not imply causation. What we observe might not be representative of the truth.

This part was (naturally) more complex than most of the other materials so far, and it was surprising to see such a wide array of problems that can arise from A/B testing. Having a basic understanding of statistics could be acceptable, but mastering this area will allow you to run far better experiments. Even though it is one of the more difficult aspects of CRO, a better understanding of statistics will make it far easier to find a path towards sustainable growth.

This is part 8/12 in my series reviewing the CXL Institute CRO Minidegree. I will be posting a new part every week!