Using Standard Deviations To Determine Web Analytics Benchmarks
I was asked to explain what I meant in a recent post about using standard deviations with web analytics data to formulate benchmarks.
First I’ll explain what I mean by Standard deviations. A standard deviation is the most common measure of a statistical dispersion, in other words measuring how widely spread the values in a data set are. If many data points are close to the average (or mean), then the standard deviation is small; if many data points are far from the mean, then the standard deviation is large. If all the data values are equal, then the standard deviation is zero.
The reason I use Standard deviations is to quickly spot deviations in data from normal levels. Sometimes trends are hard to spot, but with standard deviations they are easy to pick up. Take the last example I showed;

As you can see I’ve circled the slight peak, however that is not an obvious peak to an analyst looking at data every day. Over the course of this dataset, in this graph, it looks like a fairly smooth variation – a little more at the beginning of the month than the end but nothing drastic. The reason I spotted this was because I ran the visit data through Excel and did a benchmark/standard deviation check.
The practical value in web analytics of using Standard Deviations is basically to determine when the data tells you that something is moving away from the “mean average”. It highlights things in your data that should be looked at. In the case above I drilled down over the time period to take a closer look and found that one particular site had driven more traffic from an awareness campaign that lasted approximately two weeks and then dropped off to normal levels.
The way to calcultate the standard deviation is pretty easy if you can grasp the basic maths. The longer the period of data you look at theoretically the more accurate your standard deviation should be. I am going to show the example above and how it was done which basically had a months data.
You can benchmark visitors, visits and page views (or interactions) quite easily. You can also then work out ratio benchmarks such as conversion rates by working out the number of successful “actions” with a standard deviation in the same manner (so once you have got a standard deviation for visits, then work out the standard deviation for actions and you have 2 average numbers which can then be used as your conversion ratio benchmark).
Step 1) Firstly get all the data into excel. It makes it easier. I took 30 days visit data into excel listed as days 1-30 and the visit numbers in a simple list. Like so;
NB; It’s important you get the right number for every day in your list, so you have all the fluctuation in your data sample listed. If you’re not interested in the maths behind the method go straight to step 4.
Step 2) Find the mean of the numbers listed. In my excel sheet this was done by using the following function =AVERAGE(B2:B31). My mean average was 3204.2 visits per day. I do this simply to verify what the numbers are telling me look about right. The mean should be about in the middle.
Step 3) Find the standard deviation of the numbers. In excel this is all worked out for you by simply using a function =STDEVA(B2:B31). Essentially what this function does is work out the data fluctuation from the mean. By doing this I had a standard deviation of 336.55. This means that on average my daily visits are 3204, my top limit is 3204 + 336.5 and my bottom limit is 3204.2-336.5. So if my numbers fall between roughly 2868 and 3540 based on the historical data for 30 days it’s average behavior. If the figure appears over or below that period it’s showing abnormal behaviour and should be examined further.
Step 4) Use Excel to build a graph. Once you have the data it’s very fast and very simple to display your deviation. Simply do a normal line graph of your data, days along the bottom visits as your data points. Once you have your chart click the “layout tab” in excel 2007 and then the error bars. Then simply select “Error bars with standard deviation”.
What you should then see is a graph which looks a bit like this;

As you can see the graph shows you the standard deviation shown by the black lines surrounding the the data at roughly 3540 and 2868 across the 30 day sample. Because of this we can clearly see that there is a “deviation from the norm” around the 8th till about the 15th of the month which was the period you then want to examine more closely.
So in summary standard deviations are very fast to set-up in Excel and very easily allow you to examine which trends you should be concentrating on. Here I’ve explained roughly the principles wihtout going too deep into the maths behind it. Suffice to say Excel works it all out for you. All you need to do is import your data, run a line graph and add error bars to 1 standard deviation to see roughly what your benchmarks should be.





interesting
thanks