maxm consulting
Solutions
Define Your Goals
White Papers
Site Design
Database
Marketing
Hosting
The maxm team

maxm consulting

PROVIDING SOLUTIONS TO BUSINESSES
FOR OVER 10 YEARS


10,000 Hits a Day and No Sale

About Access Statistics

You've heard the numbers. "Over 1,000,000 hits the first week alone!", "10,000 hits a day!". Sure, they sound terrific, but how many real people are visiting that web site? How many sales are being made? Those numbers are something quite different. The following article discusses what access statistics are, and how they are interpreted.

Access statistics are a way to measure the traffic at a web site. When a web user requests a page to be loaded into their browser, a record of that request is recorded in a file, usually called access_log.

Why do we care?

If we overestimate the traffic to our web sites, we build up an expectation that is unrealistic. If I had a retail site that was reporting 10000 hits a day but I was only selling 20 products a day, I might be pretty disappointed in my results. If these hits included graphics (say I had 9 images on each page, and the site was 10 pages deep), I would then need to refine the statistics: if a single visitor went to every page on the site, that would be 100 hits (10 hits/page * 10 pages). Divide 10,000 by 100, and you get 100 visitors a day. If I sell 20 products per 100 visitors, I'm selling one product to every 5th person. Not bad, and much better than 1 product per 5000 hits (10,000/20). If the statistics included page reloads, then the sales ratio would be even better, and the new objective would be to increase the number of visitors to the site.

Let's look at an access log.

The access log file looks something like this:

cfa8.bc.edu - - [31/Mar/1996:23:49:37 -0500] "GET /images/capelin2.gif HTTP/1.0" 200 1302
cfa8.bc.edu - - [31/Mar/1996:23:49:37 -0500] "GET /images/smcape.gif HTTP/1.0" 200 888
cfa8.bc.edu - - [31/Mar/1996:23:49:38 -0500] "GET /images/citynet.gif HTTP/1.0" 200 776
cfa8.bc.edu - - [31/Mar/1996:23:50:00 -0500] "GET /images/capelin2.gif HTTP/1.0" 200 1302
cfa8.bc.edu - - [31/Mar/1996:23:50:00 -0500] "GET /images/smcape.gif HTTP/1.0" 200 888
cfa8.bc.edu - - [31/Mar/1996:23:50:08 -0500] "GET /search-bin/aglimpse/03?query=jobs HTTP/1.0" 200 189
vtr163.ramp.together.net - - [31/Mar/1996:23:50:18 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 304 -
vtr163.ramp.together.net - - [31/Mar/1996:23:51:31 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 304 -
edgar.cs.washington.edu - - [31/Mar/1996:23:52:08 -0500] "GET /reo/jregan/ HTTP/1.0" 200 4640
vtr163.ramp.together.net - - [31/Mar/1996:23:52:47 -0500] "GET /infoctrs/NIIC.html HTTP/1.0" 200 12640
vtr163.ramp.together.net - - [31/Mar/1996:23:52:55 -0500] "GET /images/nantlin2.gif HTTP/1.0" 200 585
pppb6.shasta.com - - [31/Mar/1996:23:52:55 -0500] "GET /infoctrs/beaches.html HTTP/1.0" 200 8359
vtr163.ramp.together.net - - [31/Mar/1996:23:52:56 -0500] "GET /images/smnant.gif HTTP/1.0" 200 152
pppb6.shasta.com - - [31/Mar/1996:23:52:57 -0500] "GET /infoctrs/beach.gif HTTP/1.0" 200 12925
inet-gw-0.ey.ca - - [31/Mar/1996:23:53:04 -0500] "POST /cgi-bin/bbres1 HTTP/1.0" 200 182
pppb6.shasta.com - - [31/Mar/1996:23:53:08 -0500] "GET /infoctrs/cool.gif HTTP/1.0" 200 1746
ix-prv1-03.ix.netcom.com - - [31/Mar/1996:23:58:23 -0500] "GET /infoctrs/ccic/lodging.html HTTP/1.0" 200 14783
ix-prv1-03.ix.netcom.com - - [31/Mar/1996:23:58:30 -0500] "GET /images/smcape.gif HTTP/1.0" 200 888
gopher.cis.yale.edu - - [01/Apr/1996:00:04:17 -0500] "GET /infoctrs/CCIC.html HTTP/1.0" 200 9132
gopher.cis.yale.edu - - [01/Apr/1996:00:04:19 -0500] "GET /infoctrs/ccic/intro.html HTTP/1.0" 200 8525
gopher.cis.yale.edu - - [01/Apr/1996:00:04:22 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 200 7833
gopher.cis.yale.edu - - [01/Apr/1996:00:04:22 -0500] "GET /images/capelin2.gif HTTP/1.0" 200 1302
gopher.cis.yale.edu - - [01/Apr/1996:00:04:24 -0500] "GET /infoctrs/search.gif HTTP/1.0" 200 430
gopher.cis.yale.edu - - [01/Apr/1996:00:04:24 -0500] "GET /images/smcape.gif HTTP/1.0" 200 888
gopher.cis.yale.edu - - [01/Apr/1996:00:04:43 -0500] "GET /infoctrs/ccic/wx.html HTTP/1.0" 200 1793
gopher.cis.yale.edu - - [01/Apr/1996:00:05:21 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 304 -
gopher.cis.yale.edu - - [01/Apr/1996:00:05:28 -0500] "GET /infoctrs/ccic/search.html HTTP/1.0" 200 1574
gopher.cis.yale.edu - - [01/Apr/1996:00:05:38 -0500] "GET /infoctrs/ccic/general.html HTTP/1.0" 200 1879
The file is formatted as follows - IP address, date and time, action, file, protocol, status, and number of bytes. The section above is a 16 minute section of a log file. Now, if we were to count 'hits' to a specific page, we might parse the file so that only it remained. Let's parse it for /infoctrs/ccicindx.html - the (former) 'index' page for the Cape Cod Information Center.
vtr163.ramp.together.net - - [31/Mar/1996:23:50:18 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 304 -
vtr163.ramp.together.net - - [31/Mar/1996:23:51:31 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 304 -
gopher.cis.yale.edu - - [01/Apr/1996:00:04:22 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 200 7833
gopher.cis.yale.edu - - [01/Apr/1996:00:05:21 -0500] "GET /infoctrs/ccicindx.html HTTP/1.0" 304 -
At first count, it appears that the page was loaded 4 times. Many access statistics gatherers count each page load as a hit, but if we look closely we will discover two important things. First, there are only 2 IP addresses: vtr163.ramp.together.net and gopher.cis.yale.edu; second, there are 2 different status codes: 304 and 200. If we look at the NCSA server documentation, we find that a status code of 200 means a 'fresh load' of the page, and a status code of 304 means that the page was already loaded into the user's cache and the user is just revisiting the page. So, we know that 2 individuals have visited the page, judging from the IP addresses, and that the page was only freshly loaded one time.

The statistics that maxm consulting provides first weed out all of the hits that are not 'fresh loads', and sums the number of individual IP addresses. Because IP addresses are often shared, this sum is probably less than the number of individuals who visited your pages.

In addition to counting the individual file hits, many access statistics gatherers will also count the images loaded on a page. In the case of the Information Center there are 3 image files on the index page:

	/images/capelin2.gif 
	/infoctrs/search.gif
	/images/smcape.gif 
If we were to count the images as well as the page, we would count a total of 4 each time the page was loaded. In the case above, that might mean that instead of an original 4 page loads, we would now be reporting 16 hits. We have effectively overestimated the number of visitors to the page by a factor of 8. If a page had a lot of little colored balls or other decorations, the overestimation could be even higher. Because images don't usually comprise an entire page, maxm consulting weeds out all images from the logs.

Just as it is important to determine the number of IP addresses visiting your site, it is also important to determine which pages are visited frequently. In most cases, the home page for your site will be visited most frequently, and visits to that page are a rough estimate of the number of individuals visiting your site. If, in evaluating the statistics, you discover that a page is not being visited, you can look at the navigation mechanisms to see if there is a better way to reference that page. On the other hand, looking at the most active pages is important as well; many customers have discovered that they have a different business activity on the web than they had anticipated. In one case, an insurance salesman went on the web to sell life insurance to individuals. However, activity to his site led him to sell annuities instead. Without his access statisics he never would have anticipated that his web business focus should change.

It is also important to remember that the total of the page loads is not equal to the number of people visiting your site, unless your site is only one page. For example, if you have a 4 page site and the accesses are as follows:

 	/myhome/index.html	505
 	/myhome/page1.html	407
 	/myhome/page2.html	290
 	/myhome/page3.html	350
the total number of accesses is 1552 -- that does not mean that 1552 people have visited your site; probably around 505 people have visited your site (based on the number of visits to the index page). The more pages you have, the larger your overestimation could be. For example, if in a given month 33,000 pages were loaded you would be mistaken to say you had over 1,000 visitors a day to your site. If the average visitor looked at 10 pages, then in actuality you only had 100 visitors a day to your site. A far different number.

Example access statistics (site fpl):

Here are the access statistics for your WWW site:

Looking for: fpl From: 08/Apr/1996 To: 14/Apr/1996 Total accesses: 130 Size: 639823 bytes Total Number of Sites Accessing: 44 Statistics by File:

/fpl/ 17 /fpl/children.html 3 /fpl/clams.html 14 /fpl/compbuy.html 13 /fpl/faq.html 15 /fpl/ffplform.html 1 /fpl/friends.html 4 /fpl/happenings.html 11 /fpl/index.html 2 /fpl/kidbookl.html 2 /fpl/policy.html 4 /fpl/services.html 8 /fpl/special.html 4 /fpl/welcome.html 32

In the week above, 44 individual sites accessed the pages, the main page was accessed 19 (/fpl/ and /fpl/index.html) times, and the welcome page was accessed 32 times. A total of 130 accesses were made to the site for the week. Traffic to the main page, welcome page, clams, compbuy, and happenings pages are all high, while the rest of the site shows lackluster performance. Each visitor visited approximately 3 pages per visit.
This page created and maintained by maxm consulting
Please send comments to: info @ maxm.net
copyright maxm consulting 1995-2003