Understanding the time it takes to test with Ezoic

Piper Lofrano -

 

“Patience you must have my young padawan”

– Yoda



Testing requires a certain level of patience. We don’t know what layouts will work for your site and which ones won’t. If we did, we probably wouldn’t be a testing platform! But that’s also the beauty of it - no one really knows what is going to work best until it’s been tested. Oftentimes, the end result is counterintuitive; something that you may think is a ‘user friendly’ layout ends up having the highest bounce rate! Thanks to the data, Ezoic’s system is able to find the layouts that work best without letting personal opinion get in the way!


Data is what fuels the tests and drives the results. Therefore, the system must receive a sufficient amount of data in order to function properly. But how much data does the system need? And, more importantly, how long does it take to get to that point?


In this article we will discuss the mathematics behind the testing process and hopefully give some insight into the length of time it takes to test your site’s layout.



What exactly is Ezoic testing?


Ezoic’s platform automatically builds and tests hundreds of new layouts of your exact same content on desktop, tablet, and mobile. The various templates differ by menu location (top vs. left navigation), number of columns, background colors, fonts, etc. An easy way to think about it is to equate it to testing 100+ different Wordpress themes. But it doesn’t stop there!


Within each layout, the system will try various ad sizes, placements, and combinations of ads. All ads dilute each other, so finding the perfect combination of ad placements and density is tricky business. According to AdSense, the 336x280 large rectangle is one of the most successful ad sizes, but where should it go? In the sidebar? Under the title? In the middle of the content? And how will it perform if there’s a skyscraper in the sidebar? Or another rectangle ad on the page?


The amount of possibilities are endless, which is why the system tests them all and reports back to you on the performance in your user dashboard.



Digging into the numbers


Now that you’ve got an understanding of what is being tested, let’s get into the numbers a bit. Before we do, there’s a couple of terms that should be well understood by anyone interested in testing.


Margin of error: The margin of error represents the amount of random sampling error in the test’s results. The higher the margin of error, the higher the volatility in the results.


Confidence Level: The confidence level “refers to the percentage of all possible samples that can be expected to include the true population parameter”


Law of small numbers or otherwise known as “hasty generalization”


Definition:  A hasty generalization is a fallacy in which a conclusion is not logically justified by sufficient or unbiased evidence.


Example: There are 10 marbles in a bag. Five of them are black and five of them are white. On the first try, you reach in a pull out a black marble. You put the marble back in the bag and pull out a second marble. This one is black as well. If you were to stop here you would claim that all the marbles in the bag are black - or at least the majority of them. However, if you were to run the test many more times, you would eventually come to the realization that you have a 50% chance of pulling a black or white marble. The sample size of two marbles is too small and doesn’t paint an accurate picture of the overall population.


Now back to Ezoic…


From testing thousands of sites and spending years testing, we’ve found that it takes about 7k-8k visits to a single layout to reach full confidence in the results of that layout.


So, what does that mean?


This means that if a layout receives 7k visitors, we can say that we are about 90% confident that it will continue to perform at it’s current performance metrics. So if it has a bounce rate of 12% and an EPMV of $5.00, we can assume that it is likely to continue to perform at that rate regardless of whether it receives 20k, 50k, or 100k more visitors.


On the other hand, let’s say Layout A has only received ~4,000 visitors and the confidence level is only 50%. This means that the results for Layout A have a margin of error of 50%. If we were to assume that the current stats are an accurate representation of the performance, it is the same as assuming all the marbles in the bag are black. This conclusion is based on limited evidence.


The margin of error for Layout A shows that the results can go up or down. If Layout A has a current EPMV of $5.00 with a 50% margin of error, there’s still uncertainty in the results. The EPMV of $5.00 could go up to $7.50 (which would be great), or could go down to $2.50 as the system collects more data and becomes confident in the results. (These values on either end of the confidence interval are known as bounds. The upper bound is $7.50 and the lower bound is $2.50.) Once Layout A receives 7-8k visitors, the system will be fully confident in the results. At this point we can assume that Layout A will continue to perform at the same level regardless of how many visitors it receives.


So when does the system define a winning layout?


The system will define a winning layout when it is certain that the lower bound of that layout beats the original layout’s upper bound, or the current winning layouts upper bound. Let’s take a look at an example:


Layout

A

B

Current EPMV

$12

$8

Confidence Level

50%

90%

Upper Bound EPMV

$18

$8.80

Lower Bound EPMV

$6

$7.20


Looking at the table, most people would initially choose Layout A to receive the majority of the visits because the current EPMV is higher. However, this is where the confidence level plays a big role. If we were to send 100% of the traffic to Layout A, there is a chance that the EPMV could drop to $6. This would be poor performance compared to Layout B, which is currently performing at an $8 EPMV with only a small margin of error.


The system will classify Layout B as the winning layout until the lower bound of Layout A is higher than the upper bound of Layout B. And Layout A will get ‘phased out’ when it’s upper bound is less than the lower bound of Layout B. See the examples below.


This is why the confidence bar in the experiments table is so important. Here’s an example of what I explained above.


confidence level.png


Many publishers will email us asking why the system isn’t sending traffic to a layout that appears to be doing well, in this case Clownfish. If you look closely, you’ll see that Clownfish has only received 24 visitors (remember it takes about 7k-8k visits to reach full confidence) and therefore the results are negligible. There is such a high margin of error that there’s no telling what the final results will be once it reaches full confidence.


This is why it’s important to only consider the layouts that have a green bar under their confidence level, meaning the results are statistically relevant. In the case of testing your site’s layout, we typically recommend giving the system time to test 8-10 layouts to full confidence to get an idea of whether or not testing is right for you.


So how does the system decide what layouts to test?


The system doesn’t randomly pick layouts. It will test layouts from families of layouts that have similar performance metrics to see how your users respond. If they respond well to one family of layouts, then the system will continue to test other layouts from that family. There is a method to the madness, and the amount of data that goes into making the decisions is impressive.


Luckily for you, Ezoic’s technology has the computing capability to make all these decisions on it’s own. This gives you more time to focus on your users and giving them the content they want.


So this brings us to the million dollar question: How long will it take to optimize?


First things first, Ezoic is a continuous testing platform. That means that the testing never really ends. Even when the system has identified a ‘winning’ layout it will continue to test new layouts, elements within layouts, etc. with a small portion of your traffic. This accounts for changes in technologies and user behavior, ensuring that your users are getting the best layouts at all times.


However, results really start to kick in once the system has had time to test about 8-10 layouts to full confidence. The time it takes to reach that point is dependent on your traffic. Let’s walk through an example.


ExampleSite.com receives 100k visitors/month. 50% are on desktop, 40% are on mobile, and 10% are on tablet.


Let’s start with desktop. We now have 50k visitors/month. If this publisher keeps the default settings, 10% of his traffic will be directed to his original layout (5k). That leaves the system with 45k visitors/month to test with.


This means that in one month, the system will test ~7 layouts to full confidence.


Now let’s go to month two. The system will define a winning layout, giving it 75% of the traffic. With 10% still going to the original, that leaves us with ~7500 visitors/month to test with. Therefore, it will take 4 weeks to test another layout to full confidence.


This brings the time it takes to test 8-10 layouts to full confidence to 8-10 weeks.


This doesn’t mean that it will take that long to see results. In fact, a lot of publishers start seeing positive results within the first couple weeks of testing. Testing 8-10 layouts to full confidence just gives the system the opportunity to test a wide variety of elements and establish whether or not testing is something that will work for you. In most cases, by the time the system has tested 8-10 layouts to full confidence most publishers are making about 2x their revenue!


Now let’s look at another example. SiteExample.com receives 250k visitors per month with the same breakdown per device. Therefore desktop receives 125k visitors/month. 12.5k goes to the original layout which leaves us with 112.5k visitors/month. This means the system will test 8 layouts to full confidence in two weeks!


Conclusion


So the more traffic a site has, the faster the optimization process. A smaller site will take a bit longer to optimize, but the results are equally as significant. This is where patience comes into play! The period of volatility, whether it’s a couple of days (for a big site) or a couple of weeks (for a smaller site) can be scary. This is a period of time where the system is trying new things and learning what your users respond best to. If you’re struggling during this period, don’t give up hope. We’ve tested thousands of websites and have seen the positive results - not only for user experience metrics and ad income, but for overall site health, rankings and traffic. Believe in the idea of testing and practice patience and you’ll be happy you did!




Have more questions? Submit a request

1 Comments

  • 0
    Avatar
    No Name

    This article has answered so many questions that I had about the testing process. Thanks for explaining it so clearly! 

Please sign in to leave a comment.
Powered by Zendesk