• home
  • forum
  • my
  • kt
  • download
  • Introduction to the TreeWalker object of DOM

    Author: 2007-08-08 08:44:31 From:

    The TreeWalker object is a powerful DOM2 object that lets you easily filter through and create custom collections out of nodes in the document. Ok, this is sounding geeky already, but for geeky jobs requiring parsing the document tree, it doesn't hurt at all to get familiar with this object. While scripting you may have come across the need to retrieve all elements in a webpage with a specific CSS classname, or for a XML document, elements that carry a particular attribute value. The TreeWalker object makes light work of accomplishing such tasks. In this tutorial, I'll provide a introductory look at the TreeWalker object of DOM2, which is a DOM2 method supported in Firefox/Opera8+ though not IE6 or IE7 (as of beta3).

    Before I continue, note that there is a cousin to the TreeWalker object called NodeIterator, which I'll cover in a future tutorial.

    document.createTreeWalker() method

    The TreeWalker object can come off as mysterious and complicated to some, but it really is just realized through a single method- document.createTreeWalker(). This method and the 4 parameters it accepts simplifies what may take many times the conventional coding required to, say, filter all nodes in the document that are of a certain element type and carry a particular attribute. But before we get to all that, here's a basic description of document.createTreeWalker():

    document.createTreeWalker(root, nodesToShow, filter, entityExpandBol)

    Time to break down the 4 parameters:

    1. root: The root node to begin searching the document tree using.

    2. nodesToShow: The type of nodes that should be visited by TreeWalker.

    3. filter (or null): Reference to custom function (NodeFilter object) to filter the nodes returned. Enter null for none.

    4. entityExpandBol: Boolean parameter specifying whether entity references should be expanded.

    For 3), the valid constant values are:

    NodeFilter constants
    NodeFilter.SHOW_ALLNodeFilter.SHOW_
    ENTITY_REFERENCE
    NodeFilter.SHOW_
    DOCUMENT_TYPE
    NodeFilter.SHOW_ELEMENTNodeFilter.SHOW_ENTITYNodeFilter.SHOW_FRAGMENT
    NodeFilter.SHOW_ATTRIBUTENodeFilter.SHOW_
    PROCESSING_INSTRUCTION
    NodeFilter.SHOW_NOTATION
    NodeFilter.SHOW_TEXTNodeFilter.SHOW_COMMENT 
    NodeFilter.SHOW_
    CDATA_SECTION
    NodeFilter.SHOW_DOCUMENT 

    While there are 15 different NodeFilter constants to let you limit the type of nodes returned by TreeWalker, you probably will just be working with a few of them most of the time. NodeFilter.SHOW_ELEMENT for example returns all element nodes.

    Ok, so you're dying to see a demonstration of document.createTreeWalker(), a very rudimentary one to start:

    <div id="contentarea">
    <p>Some <span>text</span></p>
    <b>Bold text</b>
    </div>
    
    <script type="text/javascript">
    
    var rootnode=document.getElementById("contentarea")
    var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, null, false)
    
    </script>

    In this example, I specify the root node for TreeWalker to begin traversing to the container with ID "contentarea". The second parameter for the object specifies that TreeWalker should only crawl element nodes (versus text nodes, comment nodes etc) within the container. The third parameter, set to null, means no additional filtering should be done (not yet!). The 4th parameter concerns whether entity references should be expanded, and is set to false. With all the parameters in place, "walker" now references all elements (P, SPAN, and B) within the DIV, along with the DIV itself.

    TreeWalker traversal methods

    Having created a filtered list of nodes using document.createTreeWalker(), you can then process these filtered nodes using TreeWalker's traversal methods:

    TreeWalker traversal methods
    MethodDescription
    firstChild()Travels to and returns the first child of the current node.
    lastChild()Travels to and returns the last child of the current node.
    nextNode()Travels to and returns the next node within the filtered collection of nodes.
    nextSibling()Travels to and returns the next sibling of the current node.
    parentNode()Travels to and returns the current node's parent node.
    previousNode()Travels to and returns the previous node of the current node.
    previousSibling()Travels to and returns the previous sibling of the current node.

    TreeWalker traversal properties
    PropertyDescription
    currentNodeReturns the current position/ node of TreeWalker. Read/write, allowing you to explicitly set the current position of TreeWalker to a particular node within the nodes returned.

    Don't confuse the above methods with the standard DOM element properties/ methods; the ones work exclusively within the TreeWalker object to let you navigate the filtered nodes.

    Using the same example as above, lets see how to use the traversal methods to walk through the returned nodes:

    <div id="contentarea">
    <p>Some <span>text</span></p>
    <b>Bold text</b>
    </div>
    
    <script type="text/javascript">
    
    var rootnode=document.getElementById("contentarea")
    var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, null, false)
    
    //Alert the starting node Tree Walker currently points to (root node)
    alert(walker.currentNode.tagName) //alerts DIV (with id=contentarea)
    
    //Step through and alert all child nodes
    while (walker.nextNode())
    alert(walker.currentNode.tagName) //alerts P, SPAN, and B.
    
    //Go back to the first child node of the collection and alert it
    walker.currentNode=rootnode //reset TreeWalker pointer to point to root node
    alert(walker.firstChild().tagName) //alerts P
    
    </script>

    As you use the traversal methods to step through the nodes, TreeWalker not only returns the node in question, but travels to it. This is why after stepping through the nodes using:

    while (walker.nextNode())
    //code here

    I reset TreeWalker's position back to its root node before trying to retrieve the firstChild of the filtered collection:

    walker.currentNode=rootnode //reset TreeWalker pointer to point to root node

    This is necessary, since TreeWalker prior to that point has its pointer directed at the very last node (B element) of the collection due to the while loop, in which there is no firstChild. and even if there were, is not the firstChild of the entire filtered collection, but the B element's!

    Ok, another example of traversal in TreeWalker to solidify our understanding of it:

    <p id="essay">George<span> loves </span> <b>JavaScript!</b></p>
    
    <script type="text/javascript">
    
    var rootnode=document.getElementById("essay")
    var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_TEXT, null, false)
    
    walker.firstChild() //Walk to first child node (the text "George")
    var paratext=walker.currentNode.nodeValue
    while (walker.nextSibling()){ //Step through each sibling of "George"
    paratext+=walker.currentNode.nodeValue
    }
    
    alert(paratext) //alerts "George loves JavaScript!"
    
    </script>

    In this example, I traverse all text nodes of the root container to get its entire textual content.

    You're free to use standard DOM element properties/ methods on top of the TreeWalker traversal methods, though the returned information reflect that node's relationship relative to the entire document, not just the filtered results. An example should drive this point home:

    <ul id="mylist">
    <li>List 1</li>
    <li>List 2</li>
    <li>List 3</li>
    </ul>
    
    <script type="text/javascript">
    
    var rootnode=document.getElementById("mylist")
    var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, null, false)
    
    alert(walker.currentNode.childNodes.length) //alerts 7 (includes text nodes)
    alert(walker.currentNode.getElementsByTagName("*").length) //alerts 3
    
    </script>

    In this example I'm using TreeWalker to filter out all elements of the UL element. The line of interest here is:

    alert(walker.currentNode.childNodes.length) //alerts 7 (includes text nodes)

    You may have expected 3 to be alerted; after all, there are only 3 elements within the UL list. However, "childNode" is a DOM property, not TreeWalker's, and returns information about a node oblivious to any filtering that may have taken place by TreeWalker! That's why 7 is returned, the total number of nodes including text nodes that the UL contains. The same concept applies to DOM methods that you may invoke on top of a TreeWalker returned node.

    Having learned to navigate the nodes filtered by document.createTreeWalker(), it's time to see how to refine the filtering process itself. Recall that the 3rd parameter of document.createTreeWalker() accepts an optional reference to a filtering function. Lets look at that next.

    Filtering in document.createTreeWalker()

    The essence of the Tree Walker object is to easily filter nodes within a document. In the previous page we looked at the various NodeFilter constants (ie: NodeFilter.SHOW_ELEMENT) that provide basic top-level filtering. But that's hardly enough in real world cases. That's where the 3rd parameter of document.createTreeWalker() comes in, which lets you pass in a reference to your custom filtering function that picks up where the 2nd parameter left off:

    document.createTreeWalker(root, nodesToShow, filter, entityExpandBol)

    "filter" is the function reference to a filtering function:

    myfilter=function(node){
    if (node.tagName=="DIV" || node.tagName=="IMG") //filter out DIV and IMG elements
    return NodeFilter.FILTER_ACCEPT
    else
    return NodeFilter.FILTER_SKIP
    }
    
    var walker=document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT, myfilter, false)
    
    while (walker.nextNode())
    walker.currentNode.style.display="none" //hide all DIV and IMG elements on the page

    In the above, I define a custom function "myfilter()" to filter out (internally) all DIVs and IMGs in the document. Such a function accepts one parameter, the node currently being pointed at as Tree Walker traverses the document. Within this function, 3 constants are supported to allow you to either accept, reject, or skip the node:

    NodeFilter filter function constants
    NodeFilter.FILTER_ACCEPTNodeFilter.FILTER_REJECTNodeFilter.FILTER_SKIP

    FILTER_ACCEPT is self explanatory, and when returned informs TreeWalker to accept this node. However, FILTER_REJECT and FILTER_SKIP differ in a subtle way that is important to understand. With FILTER_REJECT TreeWalker will reject the node in question plus any descendants of the node, while with FILTER_REJECT, TreeWalker will skip the node in question but not its descendants. In other words, if you wish to filter out nodes independent of their relationship with a parent node, use NodeFilter.FILTER_SKIP instead of NodeFilter.FILTER_REJECT. Consider the same filter function above, but slightly modified to use "REJECT" instead of "SKIP" to oust unwanted nodes:

    myfilter=function(node){
    if (node.tagName=="DIV" || node.tagName=="IMG") //filter out DIV and IMG elements
    return NodeFilter.FILTER_ACCEPT
    else
    return NodeFilter.FILTER_REJECT
    }

    In this case, not all DIV and IMG elements in the document may be extracted! This is because an image may be contained inside a rejected element such as <P>, causing TreeWalker to skip it automatically once it encounters the unwanted P element.

    - Example: Manipulate elements by class attribute

    In this demonstration, I'll use the TreeWalker object to easily filter out all elements on the page with class="blue", and change its color to red.

    getelementbyclass=function(node){
    if (node.className=="blue") //filter out elements with this class attribute
    return NodeFilter.FILTER_ACCEPT
    else
    return NodeFilter.FILTER_SKIP
    }
    
    var rootnode=document.body
    var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, getelementbyclass, false)
    
    while (walker.nextNode())
    walker.currentNode.style.color="red"
    
    walker.currentNode=document.body //reset Tree Walker position to root node

    Nothing new here, though note the line in red. After I'm done traversing my Tree Walker instance, I reset its currentNode property back to the root node, so subsequent calls to it will begin at the beginning of the collection of filtered nodes again.

    Mixing NodeFilter constants

    On the previous page you saw the 15 NodeFilter constants that let you filter out nodes of a certain type, such as NodeFilter.SHOW_ELEMENT, NodeFilter.SHOW_TEXT etc. These constants can actually be combined and mixed to create more inclusive or restrictive top level filters. For example:

    • OR operator: NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT

    • AND operator: NodeFilter.SHOW_TEXT + NodeFilter.SHOW_COMMENT

    • NOT operator: ~NodeFilter.SHOW_COMMENT (get everything that's not a comment)

    //filter out element and text nodes
    document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT, null, entityExpandBol)

    And that's it for the TreeWalker object of DOM2! Remember, this object is currently only supported in Firefox and Opera 8+, and not IE (as of IE7 beta 3).

    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      AJAX (20)
      Content Management (7)
      Cookies (4)
      Date and Time (12)
      Development (7)
      DHTML (14)
      Forms (8)
      Frequently Asked Questions (1)
      Image Display (9)
      Introduction to Javascript (5)
      Links and Buttons (4)
      Menus (2)
      Miscellaneous (5)
      Mouse Tricks (3)
      Navigation (8)
      Randomizing (4)
      Security (1)
      Text Effects (6)
      User Authentication (2)
      User Information (5)
      Windows and Frames (3)

    New

    Hot