Lies, damned lies, and statistics
Everywhere you go on the web you see visitor statistics touted by site owners. Have you ever stopped to consider how misleading those statistics can be? If you go somewhere and see a counter of some sort that states they’ve had 1.500,000 visitors the average non-techie person is going to think – wow this is a popular site, its had over a million people visit it. Unfortunately there is so much wrong with that, not the least of which is the rapidly increasing number of bots that scour websites on a constant basis, or that it presents very misleading eyeball figures for potential advertisers.
Currently on this site there are approximately 10 bot hits to every real hit although I have most of those hits filtered out in the results, but that takes a lot of work. I have plenty of /24 (Class C – 256IP addresses in a continuous block) and lot’s of referrer agents etc all being ignored by my main stats program Firestats, I doubt very much if most hosts even make the attempt or know that it’s needed.
One thing is that many bots don’t exactly advertise themselves as such. They don’t put any identifying information in the user agent string and one has to then analyze their behavior to catch them (such as going to numerous pages on the site and spending exactly the same amount of time on each page before preceding to the next one dropping off for a bit then coming back and doing it again on another 10 posts or so. Ones that do that find their address ranges in my deny file in iptables pretty quickly, along with a sharp rebuke sent to their hosting provider – that’s very bad manners.
Bots suck up bandwidth and, more importantly, they eat up RAM/CPU resources. Were I actually paying for this server, bots would be costing me serious money as I would find myself having to add more and more resources to the server so that legitimate users could visit here and not be faced with a long load time.
Yesterday I added a different stats program Visitor Maps to the blog – it does a different job than Firestats – between them my stats will be much more reliable – if not as impressive as some others are. Here is an example of what it shows me

It’s good at detecting most bots but is missed a number of MSN bots still I got the IP Addresses for those bots so I can expand the range of IPs to overlook (MSN. Yahoo, Google are all welcome to index my site as often as they want – others not so often) Nearly all of the hits in that 35 minute span of time was a bot – pretty eye opening isn’t it
Related posts
« ROFL
Write a comment
Commercial advertising not accepted. Comments linked to commercial sites will be deleted as SPAM







