It is therefore very important that my clients' clicks aren't being "used up" by malicious users clicking ads over and over, or by programs like spiders, crawlers and robots who visit the site and click on every link in sight!
These methods do work - here's a quote from Gavin Williams at Hexillion.com - a leading component supplier:
| "Your click-through counting has been much more accurate than other ASP sites on which we've advertised. They've overcounted by as much as 100% to 300% due to spider click-throughs. I'm impressed that your spider-filtering has kept your counts within just a few clicks of ours." |
...and that was before I added more code to cope with some other common "non-human" clickthroughs. You'll see all of these methods in the next few pages...
The first idea I implemented was to only count one clickthough per IP address per day. I didn't want to use a database or file to store the IP addresses, and then search for them - that's a little over the top. So I had the idea that I'd use an Application variable - which would get emptied each day.
The BrandNewDay ( ) function
First I initialized some variables. This code was added to global.asa's Application_OnStart function - IIS calls this function automatically when the server starts up...
| function Application_OnStart ( ) { Application.Lock ( ); // remember todays date var d = new Date; Application ( 'Today' ) = d.getDate ( ); // initialize new stuff in utils/Init.asp Application ( 'BrandNewDay' ) = 1; // a list of IP addresses that have clicked an ad Application ( 'ClickFromIP' ) = ''; Application.Unlock ( ); } |
...then I reset the BrandNewDay when the date rolled over (incidentally, I can also reset it manually by calling the BrandNewDay.asp page, which is sometimes handy. I'll leave you to look at that page):
| function Session_OnStart ( ) { Application.Lock ( ); // is it a new day? var d = new Date; if ( Application ( 'Today' ) != d.getDate ( ) ) Application ( 'BrandNewDay' ) = 1; Application.Unlock ( ); } |
Now I know when the day changed I modified utils/Init.asp so that a new BrandNewDay ( ) function gets called when this Application variable is set. I also cleared the Application ( 'ClickFromIP' ) variable that I'll use in a minute:
| // ============================================ // anything that needs doing once per day! // ============================================ function BrandNewDay ( ) { if ( Application ( 'BrandNewDay' ) == 1 ) { Application.Lock ( ); // clear the list if IP addresses that are ignored Application ( 'ClickFromIP' ) = ''; Application ( 'BrandNewDay' ) = 0; Application.Unlock ( ); } } |
Having created a new Application variable called 'ClickFromIP' that is initialized each day, now I just have to make some use of this to store the IP addresses of people who click my ads:
| // ignore any IP addresses that have been used today var sIP ='>' + Request.ServerVariables ( 'REMOTE_ADDR' ); var sClickIPs = Application ( 'ClickFromIP' ); var bIgnoreClick = false; // test if IP has clicked before if ( -1 != sClickIPs.indexOf ( sIP ) ) { // they've clicked before, so ignore them bIgnoreClick = true; } else { // this IP hasn't clicked before, so add to list Application.Lock ( ); Application ( 'ClickFromIP' ) = Application ( 'ClickFromIP' ) + sIP; Application.Unlock ( ); } |
First I get the IP address from the ServerVariables collection. Then I get the current ClickFromIP variable, and test if the current IP appears in the string using the String.indexOf method.
If it does, then I ignore the clickthrough (I still allow the clickthrough, just don't charge the client for it)
If it hasn't been used before I concatenate the IP to the string. This is why I prefixed it with a > character (on the first line) so that each IP address will be separated by this character in the string.
The next idea was to ignore clickthroughs from all user agents (browsers) except IE, Netscape and Opera. Again, I let them clickthrough, just don't make the advertiser pay for them.
The code below shows how this was done:
| // ignore any user agent that isn't Mozilla (IE and Netscape) or Opera var sAgent = '' + Request.ServerVariables ( 'HTTP_USER_AGENT' ); // make lowercase sAgent = sAgent.toLowerCase ( ); // should we count this agent? if ( -1 != sAgent.indexOf ( 'mozilla' ) || -1 != sAgent.indexOf ( 'opera' ) ) { // it's an unknown user-agent bIgnoreClick = true; } |
It was suggested that I also ignored known IP addresses used by spiders. I found an excellent source at Search Engine World that documents these.
After investigation however, I noticed that all major spiders except one, Lycos, used very unique user agents. They would all get caught by the user agent check above!
Lycos was an exception - their spider sometimes masqueraded as IE 5.0. Luckily, they also have "Lycos_Spider" in the agent string, so I modified the test above to cope with that too:
| // should we count this agent? var bKnownBrowser = ( -1 != sAgent.indexOf ( 'mozilla' ) || -1 != sAgent.indexOf ( 'opera' ) ); if ( -1 != sAgent.indexOf ( 'spider' ) ) // lycos spider acts like mozilla bKnownBrowser = false; if ( !bKnownBrowser ) { // it's an unknown user-agent, so ignore them bIgnoreClick = true; } |
Ignoring HTTP commands
Another problem that frequently skews the statistics is HTTP HEAD requests. Obviously these aren't done by humans, so we wanted to ignore them too, and only count the GET requests that all browsers make to ask the server for data.
The code below shows how this was done:
| // ignore any HTTP HEAD requests - they're not from a browser! if ( 'GET' != Request.ServerVariables ( 'REQUEST_METHOD' ) ) { // not a human, so ignore them bIgnoreClick = true; } |
discuss this topic to forum
