• home
  • forum
  • my
  • kt
  • download
  • Using PHP classes to navigate distributed whois databases

    Author: 2007-08-13 10:07:48 From:

    When we think of conducting whois lookups we're probably used to thatmeaning: querying the Internic database, usually for a domain record for a .com/.net/.org or .edu domain.

    We also knew that there were other whois servers out there, one for..mil and .int for example, not to mention the ones for the myriadccTLD's (country code TLD's) in existence, and we could look updomains in those TLD's by asking the approprate whois serverabout it.

    Things changed radically this year with the introductionof competition at the registrar level in the .com/.net/.orgnamespace and with it the advent of the SRS (Shared Registry System). ICANN set up shop to succeed the IANA and began accrediting otherregistrars while Network Solutions split into seperate business unitsto handle their registrar operation (which now competes with all theothers, albeit with a nice head start) and the registry, which isstill for all intents and purposes, a monopoly.

    After a period of holding out and refusing to sign the ICANN registraraccreditation agreement, Network Solutions finally did so and as perthe terms of the deal, began redirecting port 43 whois service fromrs.internic.net over to whois.nsiregistry.net on December 1st/1999.

    What all this means is that on December 1st, running "whois devshed.com"stopped looking like this:

    InfoWest Global Internet Services, Inc. (DEVSHED-DOM) 1845 W. Sunset Blvd. St. George, UT 84770 US Domain Name: DEVSHED.COM Administrative Contact: Cosby, David R (DRC4) dcosby@INFOWEST.COM 801.674.0165 Technical Contact, Zone Contact: Gifford, Aaron D (AG44) agifford@INFOWEST.COM 435-674-0165 (801) 634-9567 (FAX) 435-674-9654 Billing Contact: InfoWest Domain Services (IDS2-ORG) dns@INFOWEST.COM 435-674-0165 Fax- 435-674-9654 Record last updated on 06-Aug-1997. Record created on 06-Aug-1997. Database last updated on 4-Dec-1999 12:14:06 EST. Domain servers in listed order: NS1.INFOWEST.COM 204.17.177.10 NS2.INFOWEST.COM 204.17.177.20
    And started looking like this:

    Domain Name: DEVSHED.COM Registrar: NETWORK SOLUTIONS, INC. Whois Server: whois.networksolutions.com Referral URL: www.networksolutions.com Name Server: NS1.INFOWEST.COM Name Server: NS2.INFOWEST.COM

    Basically, the default whois server ceased to be whois.internic.net (a.k.ars.internic.net) which only carried records for domains registered viaNetwork Solutions or Worldnic, and started to be the registry whoisserver which carried entries for all .com/.net/.org domains.

    Excepting for the moment .com/.net/.org, most TLD's have a whoisserver somewhere listening on port 43 where you can connect toit, give it a domain name, and get a record back for that domainname if it exists. The formats of the returned record vary acrossdifferent servers.

    The .com/.net/.org namespace is a little more convoluted. We have thecentral registry database at whois.nsiregistry.net which contains aminimal record for all domains in this namespace, and then eachregistrar operates it's own whois server with more complete recordsfor only the domains they administer. Further, each registrar has it'sown whois output format for their own records.

    The result of all this is One Big Mess. Depending on whether you wantto look up a ccTLD or a gTLD (generic TLD), and if the latter, whothe registrar is and where their whois server is; trying to createa script to integrate this kind of functionality into your websitecan be, in a word, nightmarish.

    Enter Whois2.php, a collection of PHP classes designed to take theguesswork out of all this and provide tools to the webmaster to lookup and process domain name data without having to worry aboutwhere to get it.

    Assumptions:

    The class will be used primarily for querying domain names. While otherquery types exist on some servers, notably contact and host recordson the Network Solutions' server, most simply support domain name lookups.If we want to do another query type, we'll see that the class still provides us with methods of doing so.

    Web whois databases are not supported. If a given TLD's registry doesn'toperate a whois server on port 43, it doesn't belong in this class. .TOand .FM for example, only offer whois lookups through a web interface,and as such, there is no method to lookup .to or .fm domains here.

    Objectives:

    portable

    We want to make the class as portable as possible across as much of the namespace as possible. This means we want to create a class in which we don't have to do anything different in our code to lookup a domainwhether it's .com or .cz. Of course due to the differences in outputwe may have to handle those results differently, but getting thedata back from a query should be transparent regardless of what thequery is.

    modular

    We want to be able to break tasks down to smaller, managable sub-tasks.And if part of our landscape changes (i.e. the address of a whois serverchanges, or the output of one is different) we want first, that the changedoesn't affect the rest of our operations and second, that it's as easy aspossible to revise our class to cope with it.

    object oriented

    We are lazy, and we don't want to code anything twice, so in breakingdown our code into seperate modules for different tasks, those modulesshould inherit anything they need to know that other modules also needto know from a common parent. The parent or base class should be ableto hand off data to other handlers in a seemless, generic way.

    This package is a lot different from it's precursor whois.php3, which was a single class that had to be tweaked and hacked everytime somethingchanged. Whois2.php is designed in a way that if one registrar somewherechanges their output format (something NetSol does often enough), theentire class won't break.

    When all is said and done, we can use this class to do whois lookupsfor 121 TLDs, including the gTLD's .com/.net/.org/.edu/.int/.mil,the ccTLD's like .ca, .il, .at etc., Nominet's 2LD domains uk.com,gb.com and gb.net, and the ORSC (Open Root Server Consortium) alternatestaging roots (.web, .info, .shop, etc).

    At the very least we can get back simple output from the whois server itselfin raw, unparsed form. In better cases we have extended methods to furtherprocess the raw result into nice key/value pairs, and in the case ofdomains registered in .com/.net/.org we can seperate parsed results intoboth those of the registry server response and then the correspondingregistrar's server output, provided we have the methods available to parse it.

    If the methods do not exist to further process data, the class returnswhat it has. If you want to create your own extended class to providemethods to parse the raw output of a given TLD, it's very simpleto add it to the mix.

    The base class is class "Whois" defined in main.whois. We've intentionallyupper-cased the "W" which means that if can begin working with this in your existing scripts, the new class will not collide with anything thatmay be using the old version 1 whois.php3.

    The $Query variable is a hash that functions more or less as the localenvironment for the object. It keeps track of things like what our current whois server is ($Query["server"]), what the current TLD is($Query["tld"], not as obvious as you might think at first glance, considering the nominet uk.com and gb.net, which we support),what the actual query is ($Query["string"]) and last but definatelynot least, an array of any errors encountered in $Query["errstr"].

    We invoke the class in the normal method, and pass our query as well:


    $whois = new Whois("devshed.com");

    In PHP we can create constructors for our classes by putting a functionin the class with the same name as the class. We rely heavily on thosehere and in our first constructor we basically get everything readyfor a normal domain lookup (we don't have to invoke the class this waybut for our purposes we'll just look at domain lookups in normal fashionfor now).


    class Whois { .. .. .. function Whois ($query="") { require("servers.whois"); $this->VERSION=sprintf("Whois2.php v%s:%s", $this->CODE_VERSION, $this->DATA_VERSION); if(isSet($query)) { $this->Query["string"]=strtolower($query); $tld=$this->GetTld($this->Query["string"]); if($tld) { $this->Query["server"]=$this->DATA[$tld][0]; if(isSet($this->DATA[$tld][1])) { $handler=$this->DATA[$tld][1]; $this->Query["file"]=sprintf("%s.whois",$handler); $this->Query["handler"]=$handler; } $this->Query["tld"]=$tld; } else { $this->Query["status"]=-1; $this->Query["errstr"][]=$this->Query["string"]. " domain is not supported"; unset($this->Query["server"]); } } else { $this->Query["server"]=$this->NSI_REGISTRY; } }

    So we set the TLD, set the current whois server, and if we have one,we define an extended handler. The whois.main then takes care of thebasic tasks that we need to do regardless of what we're doing it to: connect to the server with Connect(), send the query string and read back the output with Lookup(), and then, if we have an extended handler,pass the result to Process().

    servers.whois

    One of the first orders of business to to load the "servers.whois" file.This contains the $DATA array which is an array indexed by supported TLD's, the values of which are arrays with their corresponding whoisserver as their first value, and an optional "extended handler" asvalue 2. We want to keep all this in a seperate file because thisdata will take on an update history seperate from the code itself,as new whois servers are reported and (hopfully) energetic PHP codersprovide additional extended classes for various TLDs and registrars.

    extended handlers

    The extended handlers demonstrate PHP extended classes and inheritance.Because output varies wildy from server to server, we provide additionalprocessing of raw output via modular and extensible "handlers", which in themselves, can utilize even further handlers.

    Take "devshed.com" as our example again. We know from using the informationin the $DATA array that it will be queried at whois.nsiregistry.net, thecentral registry server. We also see that there is an extended handler"gtld" specified for this TLD. So the Process() function is going toload the code in gtld.whois and then invoke an instance of the gtldclass which extends the whois class.


    if(!defined("__GTLD_HANDLER__")) define("__GTLD_HANDLER__",1); class gtld extends Whois { var $HANDLER_VERSION = "1.0"; .. .. function gTLD ($data,$query) { $this->Query=$query; $this->SUBVERSION = sprintf("%s-%s", $query["handler"], $this->HANDLER_VERSION); $this->result["regyinfo"]=$this->ParseRegInfo($data["rawdata"]); if($this->HACKS["nsi_referral_loop"] && $this->result["regyinfo"]["whois"]== $this->HACKS["wrong_netsol_whois"] ) { $this->Query["server"]=$this->HACKS["real_netsol_whois"]; } else { $this->Query["server"]=$this->result["regyinfo"]["whois"]; } $this->result["rawdata"]=$this->Lookup($this->Query["string"]); $this->Query["handler"] = $this->REGISTRARS[$this->result["regyinfo"]["registrar"]]; if(!empty($this->Query["handler"])) { $this->Query["file"]=sprintf("%s.whois", $this->Query["handler"]); $this->result["regrinfo"]=$this->Process($this->result["rawdata"]); } }

    Once again there is a class constructor in class gltd, and we parse the registry output in to key/value pairs. It also turns out that the registrar for devshed.com happens to be Network Solutions, and as it so happens, wehave yet another extended class "netsol" so we query that server, get theresults, and and hand it off to class "netsol" parse it into key/value pairs as well.

    The gtld object uses it's inherited Process() and Lookup() methods to do the follow-up query to the appropriate whois server, and if that'sit, it would stop there, returning the raw output. But in this casethe gtld class does have a "netsol" handler to further process the data, so it hands off to it for parsing.

    We can then add further handlers for ICANN accredited registrars by addingthem $REGISTRARS array of the gtld, much in the fashion we can add additionalccTLD handlers to the $DATA array.

    When all is said and done, and you simply want to look up pretty wellany domain name in your PHP scripts, you simply need do this:


    $whois = new Whois("devshed.com"); $result = $whois->Lookup();

    Depending on what we have under the hood as far as extended handlersgo, $result would contain:

    In the case of no extended handlers:


    $result["rawdata"]

    will contain an array of the raw output from the whois server.

    In the case of a .com/.net/.org domain with *NO* registrar handler:


    $result["regyinfo"]

    will contain parse key/value pairs of the registry output, including an array of "nameservers" and


    $result["rawdata"]

    will contain the unparsed raw output from the whois server defined in $result["regyinfo"]["whois"]

    ....and in the case of a .com/.net/.org/.edu that has a registrar handler you'll have the two as above plus $result["regrinfo"] which will be an array with key/value pairs from the registrar whois server.

    To see exactly what we have in our $result or any parts therein wecan use the handy-dandy showObject() function in utils.whois, which is bundled for debugging and diagnostics purposes (along with a coupleof other goodies.)

    So, to lookup devshed.com and then examine the actual object we get backwe can do this:


    $whois = new Whois("devshed.com"); $result = $whois->Lookup(); include( "utils.whois"); $display = new utils; $display->ShowObject($result);

    And the final result is:
    A $result["regyinfo"] array with all the registry info, note thatthe nameserver list itself another array in $result["regyinfo"]["nameserver"]:


    regyinfo->Array domain->DEVSHED.COM registrar->NETWORK SOLUTIONS, INC. whois->whois.networksolutions.com referrer->www.networksolutions.com nameserver->Array 0->NS1.INFOWEST.COM 1->NS2.INFOWEST.COM

    A $result["rawdata"] array that has the raw data from the registrar'swhois server.


    rawdata->Array 0->The Data in Network Solutions' WHOIS database is provided by Network 1->Solutions for information purposes, and to assist persons in obtaining 2->information about or related to a domain name registration record. 3->Network Solutions does not guarantee its accuracy. By submitting a 4->WHOIS query, you agree that you will use this Data only for lawful 5->purposes and that, under no circumstances will you use this Data to: 6->(1) allow, enable, or otherwise support the transmission of mass 7->unsolicited, commercial advertising or solicitations via e-mail 8->(spam); or (2) enable high volume, automated, electronic processes 9->that apply to Network Solutions (or its systems). Network Solutions 10->reserves the right to modify these terms at any time. By submitting 11->this query, you agree to abide by this policy. 12-> 13->Registrant: 14->InfoWest Global Internet Services, Inc. (DEVSHED-DOM) 15-> 1845 W. Sunset Blvd. 16-> St. George, UT 84770 17-> US 18-> 19-> Domain Name: DEVSHED.COM 20-> 21-> Administrative Contact: 22-> Cosby, David R (DRC4) dcosby@INFOWEST.COM 23-> 801.674.0165 24-> Technical Contact, Zone Contact: 25-> Gifford, Aaron D (AG44) agifford@INFOWEST.COM 26-> 435-674-0165 (801) 634-9567 (FAX) 435-674-9654 27-> Billing Contact: 28-> InfoWest Domain Services (IDS2-ORG) dns@INFOWEST.COM 29-> 435-674-0165 30->Fax- 435-674-9654 31-> 32-> Record last updated on 06-Aug-1997. 33-> Record created on 06-Aug-1997. 34-> Database last updated on 6-Dec-1999 16:08:49 EST. 35-> 36-> Domain servers in listed order: 37-> 38-> NS1.INFOWEST.COM 204.17.177.10 39-> NS2.INFOWEST.COM 204.17.177.20 40->

    And finally a $result["regrinfo"] array, because we have a "netsol" handler to parse output from the Network Solutions registrar serverwe have the following data:


    regrinfo->Array organization->InfoWest Global Internet Services, Inc. org_handle->DEVSHED-DOM org_address->1845 W. Sunset Blvd. St. George, UT 84770 US domain->DEVSHED.COM admin->Array name-> Cosby, David R handle->DRC4 email->dcosby@INFOWEST.COM tech->Array name-> Gifford, Aaron D handle->AG44 email->agifford@INFOWEST.COM billing->Array name-> InfoWest Domain Services handle->IDS2-ORG email->dns@INFOWEST.COM updated->06-Aug-1997 created->06-Aug-1997 db_updated->6-Dec-1999 16:08:49 EST ns->Array NS1.INFOWEST.COM->204.17.177.10 NS2.INFOWEST.COM->204.17.177.20

    At the time of writing the only extended handler for registrar's is the netsol class. We need further classes for register.com, melbourneIT,COREnic and all the other registrars. Until then, our output would havestopped with the $result["rawdata"] array above.

    Another Example:

    Looking up something other than a domain name:


    $STRING = "MJ177"; require("whois2.php3"); $whois = new Whois(); $whois->Query["server"]="whois.networksolutions.com"; $result = $whois->Lookup($STRING);

    $STRING in this case is a Network Solutions contact handle (mine). It could just as easily be a host handle or any other string. The registrarwhois servers at NetSol allow wildcard matching on arbitrary strings.

    Hopefully this article has served a two-fold purpose, first to explainthe new layout of the whois servers in light of newly accredited registrars and the SRS. Second, to demonstrate the rudiments ofusing PHP classes, which lend themselves quite nicely to thediverse and ever-changing situtation we have applied them tohere.

    The latest vesion of Whois2.php can be downloaded fromhttp://www.easydns.com/~markjr/whois2/

    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      Ad Management (4)
      Calendars (3)
      Chat Systems (7)
      Content Management (6)
      Cookies and Sessions (8)
      Counters (8)
      Database Related (8)
      Date and Time (9)
      Development (6)
      Discussion Boards (7)
      E Commerce (6)
      Email Systems (9)
      Error Handling (5)
      File Manipulation (10)
      Flash and PHP (4)
      Form Processing (7)
      Guestbooks (8)
      Image Manipulation (3)
      Installing PHP (5)
      Introduction to PHP (9)
      Link Indexing (6)
      Mailing List Management (8)
      Miscellaneous (10)
      Networking (6)
      News Publishing (6)
      OOP (8)
      PEAR (6)
      PHP vs Other Languages (2)
      Polls and Voting (5)
      Postcards (0)
      Randomizing (8)
      Redirection (8)
      Searching (6)
      Security (6)
      Site Navigation (7)
      User Authentication (10)
      WAP and WML (7)
      Web Fetching (0)
      Web Traffic Analysis (11)
      XML and PHP (0)

    New

    Hot