Things That Throw Web Stats
- Part 1: The Internet
Web analytics is growing more sophisticated. We’re developing methods for understanding customers, predicting trends, and assessing ROI. Every month analytics gurus amaze you with the latest revelations to sharpen your focus and tune your spend. What no one is telling you is that all these systems and numbers are based on inaccurate numbers. The god of web analytics has feet of clay - 100% accuracy is impossible.
Web analysis is based on counting a very limited number of things. People visit web sites and read pages. Therefore we can count people, visits, and page views. That’s all. Financial details are linked to these things, not inherent within them. If I buy PPC from Google, Google is charging me for visits it sends. In other words, it’s just counting visits. If all we can measure is people, visits, and page views, it’s important to understand how accurate we can be about them. The bad news is, we can’t assess any of these with perfect accuracy.
This article is the first of a two-part series exploring the errors in all web analytics. In this issue I’ll discuss the unavoidable inaccuracies which are caused by the nature of internet technology itself. In the following article I’ll discuss problems which result from user behavior and the current state of web analytics software.
We Can’t Count Visitors
It’s not possible to count people on the web. They don’t exist. People don’t visit web sites. Their computers do. The exact process is that a brower requests a copy of a page be sent to it from a server. The browser reads that page and uses it to display something on screen. People aren’t even reading your site’s pages. They’re reading what their browser did to copies of those pages. Ask any designer how consistent that process is, then duck.
What few standards there are for web metrics have been laid down by JICWEBS. This is an international body composed of the audit standards bureaus for most countries, including the USA and all European countries. The JICWEBS standard for identifying a unique visitor is that it is the combination of the User Agent and IP address. The User Agent identifies the browser and operating system. For example, mine is “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322).” MSIE 6.0 tells me it’s Internet Explorer 6.0. Windows NT 5.1 tells me I’m running Windows XP. Many people are running IE 6 on XP, so that information alone is not enough to uniquely identify me. I also have an IP address, my internet address. By combining the full User Agent with the IP address, in theory, I am uniquely identified. This is far from accurate. Every single person inside the Ford corporation has the same IP address. They all go onto the web from the same gateway in Chicago (even the 88,000 in Europe). Corporations hide internal IP addresses for valid security reasons. Most people in Ford have the same browser and operating system (what Ford call the Global Client). Thus, according to the official standards, more than 320,000 people are the same unique visitor. This will hold true for any corporation with shared internet access and a common standard for their workstations.
On the reverse side, most home users or small businesses will be given a different IP address every time they connect to the web. This means they’ll look like a different person every visit. This OK for unique visitors, but means you’re under-counting repeat visitors.
What all of this means is that you’re probably only getting about 90% accuracy with identification of unique visitors.
Monthly stats can be even more misleading on occasion. JICWEBS sets the standard for calculating unique visitors per month when producing audits. The official method for counting unique visitors per month is to calculate how many you got in a single day, then multiply that by 31. Thus if 100,000 unique visitors came to my news site in a day, I have 3.1 million unique visitors per month. So be aware that “unique visitors per month” may not be counting how many different people actually visited the site that month. Or it may. It depends.
We Can’t Count Duration
Think about what happens when someone is reading your site. They ask for a web page. Then later they ask for another one. The time taken between the two requests is deemed to be spent reading the first page. Add all these durations up and you’ve got the total time of the visit.
This creates a problem for 1-page visits. Since there is no second page, we can’t calculate a page duration. Officially a 1-page visit is not a visit, it has to be two pages to count as a visit. Some packages won’t count the zero duration 1-page visits when they determine average visit duration, but you’d be surprised how many do. If you are using one which does, you think people spend half the time on your site they really do.
Now think about what happens when someone reads the last page. There is no duration we can calculate for this page. What this means is that all web analytics packages are under-reporting the time people spend on your site. They have to because they can’t tell how long someone spends on the last page – it never gets calculated.
We Can’t Count Visits
The JICWEBS definition for a visit is that it is a series of page requests with a gap of no more than 30 minutes between each one. If someone asks for a page 31 minutes after the preceding page, it must be counted as a new visit. You’d be surprised how often this happens with complex products like mortgages, insurance, and other financial products. Generally, the more detailed the page, the more commitment required to buy, the more chance you’ll get the occasional page view which exceeds 30 minutes.
On the other hand, what if someone views your site, goes off and compares it with a competitor, then returns after 20 minutes? That still counts as part of the same visit. Technically it constitutes a single visit of two sessions, but no one follows the differentiation of sessions and visits as the standard allows.
What both of these cases illustrate is that our counting of visits is based on an arbitrary selection of 30 minutes as the magic number. For most purposes this is fine, so long as you accept it is our best attempt at a workable number, not an accurate measurement of reality.
Web analysis is statistics, not accounting. Absolute precision is impossible. The problems listed above are an inevitable consequence of the nature of internet technology, not because we don’t care or because analytics software is shoddy.
This inaccuracy is OK so long as you don’t get too excited about the fine detail. Statistics is fuzzy around the edges, so you shouldn’t make decisions based on small differences. Understand that your visitor stats are accurate plus or minus 5% or even 10%. Recognize that people are spending a little longer on your site than you can ever know, or maybe a little less. It depends on what you’re looking it. Add a margin of error to financial and ROI calculations. In statistical analysis there is the concept of “degrees of certainty,” what us ordinary folks call “margin of error.” It is possible to calculate this with slightly more precision than guesswork. If you want to get into extreme details with your analysis, you need to start incorporating concepts like this into your numbers.
If you design your processes accordingly, the exact numbers shouldn’t matter too much. You are where you are today. You want to improve on this. The key to success is to concentrate on trends over time, not individual numbers.
Talk to me if you want to discuss this, or any other issue.