• home
  • forum
  • my
  • kt
  • download
  • Ignoring clicks from IP addresses

    Author: 2007-06-24 19:59:22 From:

    My advertising system sells advertising on a CPC (cost per click) basis, not CPM (cost per thousand impressions).

    It is therefore very important that my clients' clicks aren't being "used up" by malicious users clicking ads over and over, or by programs like spiders, crawlers and robots who visit the site and click on every link in sight!

    These methods do work - here's a quote from Gavin Williams at Hexillion.com - a leading component supplier:

    "Your click-through counting has been much more accurate than other ASP sites on which we've advertised. They've overcounted by as much as 100% to 300% due to spider click-throughs. I'm impressed that your spider-filtering has kept your counts within just a few clicks of ours."

    ...and that was before I added more code to cope with some other common "non-human" clickthroughs. You'll see all of these methods in the next few pages...

    The first idea I implemented was to only count one clickthough per IP address per day. I didn't want to use a database or file to store the IP addresses, and then search for them - that's a little over the top. So I had the idea that I'd use an Application variable - which would get emptied each day.

    The BrandNewDay ( ) function

    First I initialized some variables. This code was added to global.asa's Application_OnStart function - IIS calls this function automatically when the server starts up...

    function Application_OnStart ( )
    {
       Application.Lock ( );

       // remember todays date
       var d = new Date;
       Application ( 'Today' ) = d.getDate ( );

       // initialize new stuff in utils/Init.asp
       Application ( 'BrandNewDay' ) = 1;

       // a list of IP addresses that have clicked an ad
       Application ( 'ClickFromIP' ) = '';

       Application.Unlock ( );
    }

    ...then I reset the BrandNewDay when the date rolled over (incidentally, I can also reset it manually by calling the BrandNewDay.asp page, which is sometimes handy. I'll leave you to look at that page):

    function Session_OnStart ( )
    {
       Application.Lock ( );

       // is it a new day?
       var d = new Date;

       if ( Application ( 'Today' ) != d.getDate ( ) )
          Application ( 'BrandNewDay' ) = 1;

       Application.Unlock ( );
    }

    Now I know when the day changed I modified utils/Init.asp so that a new BrandNewDay ( ) function gets called when this Application variable is set. I also cleared the Application ( 'ClickFromIP' ) variable that I'll use in a minute:

    // ============================================
    // anything that needs doing once per day!
    // ============================================
    function BrandNewDay ( )
    {
       if ( Application ( 'BrandNewDay' ) == 1 )
       {
          Application.Lock ( );

          // clear the list if IP addresses that are ignored
          Application ( 'ClickFromIP' ) = '';

          Application ( 'BrandNewDay' ) = 0;

          Application.Unlock ( );
       }
    }


    Having created a new Application variable called 'ClickFromIP' that is initialized each day, now I just have to make some use of this to store the IP addresses of people who click my ads:

    // ignore any IP addresses that have been used today
    var sIP ='>' + Request.ServerVariables ( 'REMOTE_ADDR' );
    var sClickIPs = Application ( 'ClickFromIP' );
    var bIgnoreClick = false;

    // test if IP has clicked before
    if ( -1 != sClickIPs.indexOf ( sIP ) )
    {
       // they've clicked before, so ignore them
       bIgnoreClick = true;
    }
    else
    {
       // this IP hasn't clicked before, so add to list
       Application.Lock ( );
       Application ( 'ClickFromIP' ) = Application ( 'ClickFromIP' ) + sIP;
       Application.Unlock ( );
    }

    First I get the IP address from the ServerVariables collection. Then I get the current ClickFromIP variable, and test if the current IP appears in the string using the String.indexOf method.

    If it does, then I ignore the clickthrough (I still allow the clickthrough, just don't charge the client for it)

    If it hasn't been used before I concatenate the IP to the string. This is why I prefixed it with a > character (on the first line) so that each IP address will be separated by this character in the string.

    The next idea was to ignore clickthroughs from all user agents (browsers) except IE, Netscape and Opera. Again, I let them clickthrough, just don't make the advertiser pay for them.

    The code below shows how this was done:

    // ignore any user agent that isn't Mozilla (IE and Netscape) or Opera
    var sAgent = '' + Request.ServerVariables ( 'HTTP_USER_AGENT' );

    // make lowercase
    sAgent = sAgent.toLowerCase ( );

    // should we count this agent?
    if ( -1 != sAgent.indexOf ( 'mozilla' ) || -1 != sAgent.indexOf ( 'opera' ) )
    {
       // it's an unknown user-agent
       bIgnoreClick = true;
    }

    It was suggested that I also ignored known IP addresses used by spiders. I found an excellent source at Search Engine World that documents these.

    After investigation however, I noticed that all major spiders except one, Lycos, used very unique user agents. They would all get caught by the user agent check above!

    Lycos was an exception - their spider sometimes masqueraded as IE 5.0. Luckily, they also have "Lycos_Spider" in the agent string, so I modified the test above to cope with that too:

    // should we count this agent?
    var bKnownBrowser = ( -1 != sAgent.indexOf ( 'mozilla' ) || -1 != sAgent.indexOf ( 'opera' ) );

    if ( -1 != sAgent.indexOf ( 'spider' ) )      // lycos spider acts like mozilla
       bKnownBrowser = false;

    if ( !bKnownBrowser )
    {
       // it's an unknown user-agent, so ignore them
       bIgnoreClick = true;
    }

    Ignoring HTTP commands

    Another problem that frequently skews the statistics is HTTP HEAD requests. Obviously these aren't done by humans, so we wanted to ignore them too, and only count the GET requests that all browsers make to ask the server for data.

    The code below shows how this was done:

    // ignore any HTTP HEAD requests - they're not from a browser!
    if ( 'GET' != Request.ServerVariables ( 'REQUEST_METHOD' ) )
    {
       // not a human, so ignore them
       bIgnoreClick = true;
    }

    discuss this topic to forum

    relation tutorial

    No relevant information

    New

    Hot