Originally posted on Manifold
đ A/B testing with limited data at a startup
Practically every article you read about startup marketing stresses the importance of A/B testing. From the header on your landing page to the colour of your signup button, even the most minor thing should be tested. But, almost all of these articles youâre reading assume one thing. That youâre in growth mode.
But thatâs an issue for a lot of companies.
Startups that havenât hit growth just donât have enough data to run A/B tests at scale. This means that testing at low numbers is a completely different ballpark than when you are scaling. Iâm not saying that itâs impossible, but Iâll break down some common misconceptions and mistakes that are easy to make when testing with small amounts of data.
Letâs quickly define A/BÂ testing
A marketing experiment where two variations of a landing page, ad, email or other piece of online content are pitted against each other to determine which produces the highest conversion rate.
https://unbounce.com/conversion-glossary/definition/ab-testing/
Thatâs the definition from a marketers point of view. Pit two pieces of marketing material against each other to see which converts better. This definition doesnât do justice to the statistical methods behind A/B testing.
When you run an A/B test, you are most likely doing a comparison of two binomial distributions using some type of statistical test (there are a few possibilities).
Things like click-through rates or conversion rates are binomial distributions.
If you are using A/B testing software like Optimizely, a lot of these statistics are hidden from you, which can sometimes be to your detriment.
A/B testing pitfalls
Testing software makes running experiments approachable to all types of people. You really donât have to know much about statistics to pull off an A/B test in Google Optimize or Optimizely. But not having much knowledge in stats can lead you down the path to some easy to avoid mistakes.
Letâs look at a few.
Ending your test too early
If youâre like me, you LOVE watching the numbers go up when youâre running experiments (or ads, or anything). And one of the best numbers to watch is what most A/B testing software has now. The âchance to beatâ metric. This is a number that the software calculates on the fly using a multitude of variables available.
This number can be misleading. If your âchance to beatâ is at 100% you might be thinking, âalright, letâs end this and start the next testâ, but that can get you into major trouble. This is called peeking. Peeking is when you look at the data before youâve gathered a big enough sample size and means you havenât hit statistical significance. Following the process of ending your tests early can lead to false positives that arenât going to be correct in the long run.
Running A/B tests takes a lot of traffic, especially as you get further into your funnel. Users drop off and you get less and less people seeing your test.
Letâs look at an example. Say you have a conversion rate on a landing page of 5% and you want to get it up to 5.5%. Thatâs a 10% increase. Seems pretty reasonable to get there. But to be confident in that change to the landing page, youâll need each variation to get a sample size of ~30000 visitors.
I donât know about you, but getting ~60k visitors to a landing page can be pretty tough when you are early in your startupâs lifecycle.
Checkout Evan Millerâs awesome sample size calculator to better understand the audience size you need: https://www.evanmiller.org/ab-testing/sample-size.html
Not understanding your audience
Not all visitors to your website are the same. As marketers & founders, we know motivation and intent are key to marketing proficiently.
A/B testing is the same. If you are running a test that needs 10000 sessions per variation and all of a sudden get 20k hits to your landing page because a blog post blew up. Is that really a good sense of what your conversion rate is going to be? Probably not. You need to make sure you understand who is visiting your site before you can make a call.
Sometimes itâs worth running a test to a higher significance if youâre not confident youâre sample has a decent diversity.
Testing all the small things
I hope you donât have blink 182 in your head now.
Iâve talked a lot about the sample size thatâs needed to run tests. In some cases itâs huge. And one of the biggest contributors to your sample size needed is the percent lift you want to see.
Letâs go back to the example above of our 5% -> 5.5% conversion rate. Testing to significance for that example, we would need to have around 30k visitors per variation. But say we wanted to see that conversion rate lift to 6% instead by changing the whole landing page and not just the CTA copy. That drops our visitors needed per variation down to ~12k. Thatâs half the visitors and a much more achievable number.
Making large changes to whatever you are testing is important when you you can only muster a small sample set. If you are changing something small, you wonât be hypothesizing a large lift in your test variable. But, if you make large changes to your content and go for a higher change in the test variable, your sample size needed will decrease dramatically.
Donât just go and boost your goal lift to a crazy amount here. You still need to think of testing in a scientific way and choose all of these metrics based on a good hypothesis. Otherwise you are never going to hit significance and youâll never learn anything.
Testing is hard
And itâs even harder at a startup. Tools like Optimizely and Google Optimize are making it easier and easier to run experiments, but they donât give you all of the background math that needs to be done. Not knowing those statistical methods that are running in the background can lead you to false positives.
Make sure youâre setting your sample size before your test and sticking to it. Donât call a test done just because it looks like itâs going to win. Remember, data trumps gut if you have it. Watch out for traffic spikes from one spot, it can make your audience less diverse and skew your results to the max. And lastly, donât test the small things when your throughput is low. If youâre looking to see significant results at low sample sizes, make big changes. Switch out the landing page for a completely different one, or completely change up the ad youâre running, not just a few words.
Feel free to hit me up at colin@manifold.co if you have any questions, I love to chat about this stuff.
Top comments (1)
I think the best way for startups is to catch people in the hall and make videos of how they are using their website or app, there will be a lot of interesting discoveries! A/B testing to me is like a Thor hammer that should be used only by big companies.