• home
  • forum
  • my
  • kt
  • download
  • Beef up the Find command in Firefox

    Author: 2009-04-20 08:35:20 From:

    The Find command in Firefox locates the user-specified text in the body of a Web page. The command is an easy-to-use tool that works well enough for most users most of the time. Sometimes, however, a more powerful Find-like tool would make locating text easier. This article shows how to build a tool that isolates relevant text in Web pages faster by detecting the presence and absence of nearby words.

    Native text-search capabilities in Firefox provide useful highlighting of contiguous search terms and phrases. Additional Firefox extensions are available to incorporate regular-expression searches and other text-highlighting capabilities. This article presents tools and code needed to add your own text-searching interface to Firefox. With a Greasemonkey user script and some custom algorithms, you'll be able to add grep -v functionality to text searches — that is, highlighting a first search term where a second one is not located nearby.

    Requirements

    Hardware

    Text searches on typical Web pages with older (pre-2002) hardware are nearly instantaneous. However, the code presented here is not designed for speed and may require faster hardware to perform at a user-friendly speed on large Web pages.

    Software

    The code was developed for use with Firefox V2.0 and Greasemonkey V0.7. Newer versions of both will require testing and possibly modifications to ensure their functionality. As a Greasemonkey script, the code presented here should work on any operating system that supports Firefox and Greasemonkey. We tested on Microsoft® Windows® and Linux® Ubuntu V7.10 releases.



    Back to top


    Greasemonkey and Firefox extensions

    User modification to Web pages is the role Greasemonkey fulfills, and the code presented here uses the Greasemonkey framework to search for and highlight the relevant text. See Resources for the Greasemonkey Firefox extension.

    Examples of what this Greasemonkey script is designed to do

    Those familiar with the UNIX grep command and its common -v option know how indispensable grep is for extracting relevant lines of text from a file. Text files conforming the UNIX tradition of simplicity generally store their text in a line-by-line format that makes it easy to find words close together. The -v option prints lines where the specified text is not found.

    Unlike text files, Web pages generally divide text with tags and other markers rendered into lines by the browser. A wide variety of browser window sizes makes it difficult to isolate nearby text based on expected line positions. Tables, links, and other text markup also make it difficult to isolate text that is in the same "line."

    Algorithms in this article are designed to address some of these difficulties by providing a simple grep-like functionality piped to a function that works like grep's -v option. This allows the user to find a certain word of text, then only highlight entries where a different word is not nearby. Figure 1 shows what this can look like.


    Figure 1. Example of DOM and DOM hierarchy searches
    Example of DOM and DOM hierarchy searches

    In the top portion of the image, the search text of "DOM" is highlighted by the script. In the bottom portion, notice how only the first three "DOM" entries are highlighted because the second search text of "hierarchy" is found in close proximity to the third "DOM."

    Consider Figure 2.


    Figure 2. Example of 2008 and 2008 PM searches
    Example of 2008 and 2008 PM searches

    The first portion of the image shows all the 2008 entries, while the second portion only shows the before-noon entries due to the -v keyword of PM. Read on for full details and further examples of how to implement this functionality.



    Back to top


    greppishFind.user.js Greasemonkey user script

    An introduction to the unique aspects of the Greasemonkey programming environment are beyond the scope of this article. Familiarity with Greasemonkey, including how to install, modify, and debug scripts, is assumed. Consult the Resources for more information about Greasemonkey and how to get started programming your own user scripts.

    Generally speaking, the greppishFind.user.js user script is started on a page load, provides a text area after a specific key combination is entered, and performs highlighting searches based on user-entered text. Listing 1 shows the beginning of the greppishFind.user.js user script.


    Listing 1. greppishFind.user.js program heading
    // ==UserScript==
    // @name          greppishFind
    // @namespace     IBM developerWorks
    // @description   grep and grep -v function-ish for one or two word searches
    // ==/UserScript==
    
    var boxAdded = false;       // user interface for search active
    var dist = 10;              // proximity distance between words
    
    var highStart = '<high>';   // begin and end highlight tags
    var highEnd   = '</high>';
    
    var lastSearch = null;      // previous highlight text
    
    window.addEventListener('load', addHighlightStyle,'true');
    window.addEventListener('keyup', globalKeyPress,'true');
    

    After defining the required metadata that describes the user script and its function, global variables, and highlighting tags, the load and keyup event listeners are added to process user-generated events. Listing 2 details the addHighlightStyle function called by the load event listener.


    Listing 2. addHighlightStyle function
    function addHighlightStyle(css)
    {
      var head = document.getElementsByTagName('head')[0];
      if( !head ) { return; }
    
      var style = document.createElement('style');
      var cssStr = "high {color: black; background-color: yellow; }";
      style.type = 'text/css';
      style.innerHTML = cssStr;
      head.appendChild(style);
    }//addHighlightStyle
    

    The function creates a new node in the current DOM hierarchy with the appropriate highlighting information. In this case, it's a simple yellow-on-black text attribute. Listing 3 shows the code of the other event listener, globalKeyPress, as well as the boxKeyPress function.


    Listing 3. globalKeyPress, boxKeyPress functions
    function globalKeyPress(e)
    {
      // add the user interface text area and button, set focus and event listener
      if( boxAdded == false && e.altKey && e.keyCode == 61 )
      {
        boxAdded = true;
        var boxHtml = "<textarea wrap='virtual' id='sBoxArea' " +
                  "style='width:300px;height:20px'></textarea>" +
                  "<input name='btnHighlight' id='tboxButton' " +
                  "value='Highlight' type='submit'>";
        var tArea = document.createElement("div");
        tArea.innerHTML = boxHtml;
        document.body.insertBefore(tArea, document.body.firstChild);
    
        tArea = document.getElementById("sBoxArea");
        tArea.focus();
        tArea.addEventListener('keyup', boxKeyPress, true );
    
        var btn = document.getElementById("tboxButton");
        btn.addEventListener('mouseup', processSearch, true );
    
      }//if alt = pressed
    
    }//globalKeyPress
    
    function boxKeyPress(e)
    {
      if( e.keyCode != 13 ){ return; }
    
      var textarea = document.getElementById("sBoxArea");
      textarea.value = textarea.value.substring(0,textarea.value.length-1);
      processSearch();
    
    }//boxKeyPress
    

    Catching each keystroke and listening for a specific combination is the purpose of globalKeyPress. When the Alt+= keys are pressed (that is, hold Alt and press the = key), the user interface for the search box is added to the current DOM. This interface consists of a text area for entering the keywords and a Submit button. After the new items are added, the text area needs to be selected by the getElementById function to set the focus correctly. Event listeners are then added to process the keystrokes in the text area, as well as executing the search when the Submit button is clicked.

    The second function in Listing 3 processes each keystroke in the text area. If the Enter key is pressed, the text area's value has the newline removed and the processSearch function executed. Listing 4 details the processSearch function.


    Listing 4. processSearch function
    function processSearch()
    {
      // remove any existing highlights
      if( lastSearch != null )
      {
        var splitResult = lastSearch.split( ' ' );
        removeIndicators( splitResult[0] );
      }//if last search exists
    
      var textarea = document.getElementById("sBoxArea");
    
      if( textarea.value.length > 0 )
      {
        var splitResult = textarea.value.split( ' ' );
        if( splitResult.length == 1 )
        { 
          oneWordSearch( splitResult[0] );
    
        }else if( splitResult.length == 2 )
        { 
          twoWordSearch( splitResult[0], splitResult[1] );
    
        }else
        { 
          textarea.value = "Only two words supported";
    
        }//if number of words
      }//if longer than required
    
      lastSearch = textarea.value;
    
    }//processSearch
    

    Each search is stored in the lastSearch variable to be removed each time processSearch is called. After the removal, the search query is highlighted using oneWordSearch if there is only one query word or if the twoWordSearch function if the grep -v functionality is desired. Listing 5 shows the details on the removeIndicators function.


    Listing 5. removeIndicators function
    function removeIndicators( textIn )
    {
      // use XPath to quickly extract all of the rendered text
      var textNodes = document.evaluate( '//text()', document, null,
                                         XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                                         null );
    
      for (var i = 0; i < textNodes.snapshotLength; i++)
      {
        textNode = textNodes.snapshotItem(i);
    
        if( textNode.data.indexOf( textIn ) != -1 )
        {
          // find the appropriate parent node with the innerHTML to be removed
          var getNode = getHtml( textNode );
          if( getNode != null )
          {
            var temp = getNode.parentNode.innerHTML;
            var reg = new RegExp( highStart, "g");
            temp = temp.replace( reg, "" );
    
            reg = new RegExp( highEnd, "g");
            temp = temp.replace( reg, "" );
            getNode.parentNode.innerHTML = temp;
    
          }//if correct parent found
    
        }//if word found
      }//for each text node
    
    }//removeIndicators
    

    Instead of traversing the DOM tree manually, removeIndicators uses XPath to extract the text nodes in the document quickly. If any of the text nodes contains the lastSearch text (the most recent highlighted word), getHtml finds the appropriate parent node, and the highlighted text is removed. Note that combining the extract of innerHTML and assignment of innerHTML into one step will cause various issues, so temporarily assigning the innerHTML to an external variable is required. Listing 6 is the getHtml function that shows in detail how to find the appropriate parent node.


    Listing 6. getHtml function
    function getHtml( tempNode )
    {
      // walk up the tree to find the appropriate node
      var stop = 0;
    
      while( stop == 0 )
      {
        if( tempNode.parentNode != null &&
            tempNode.parentNode.innerHTML != null )
        {
          // make sure it contains the tags to be removed
          if( tempNode.parentNode.innerHTML.indexOf( highStart ) != -1 )
          {
    
            // make sure it's not the title or greppishFind UI node
            if( tempNode.parentNode.innerHTML.indexOf( "<title>" ) == -1 &&
                tempNode.parentNode.innerHTML.indexOf("btnHighlight") == -1)
            {
              return( tempNode );
    
            }else{ return(null); }
    
          // the highlight tags were not found, so go up the tree
          }else{ tempNode = tempNode.parentNode; }
    
        // stop the processing when the top of the tree is reached
        }else{ stop = 1; }
    
      }//while
      return( null );
    }//getHtml
    

    While walking up the DOM tree in search of the innerHTML with the highlighting tags inserted, it is important to disregard two specific nodes. The nodes containing title and btnHighlight should not be updated, as changes in these nodes cause the document to display incorrectly. When the correct node is found, regardless of the number of parents up the DOM tree it is, the node is returned and the highlighting removed. Listing 7 is the first of the functions that adds highlighting to the document.


    Listing 7. oneWordSearch function
    function oneWordSearch( textIn )
    {
      // use XPath to quickly extract all of the rendered text
      var textNodes = document.evaluate( '//text()', document, null,
                                         XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                                         null );
    
      for (var i = 0; i < textNodes.snapshotLength; i++)
      {
        textNode = textNodes.snapshotItem(i);
    
        if( textNode.data.indexOf( textIn ) != -1 )
        {
          highlightAll( textNode, textIn );
    
        }//if word found
      }//for each text node
    
    }//oneWordSearch
    

    Again using XPath, oneWordSearch processes each text node to find the query. When found, the highlightAll function is called, as shown in Listing 8.


    Listing 8. highlightAll function
    function highlightAll( nodeOne, textIn )
    {
      if( nodeOne.parentNode != null )
      {
        full = nodeOne.parentNode.innerHTML;
        var reg = new RegExp( textIn, "g");
        full = full.replace(  reg,  highStart + textIn + highEnd );
        nodeOne.parentNode.innerHTML = full;
      }//if the parent node exists
    }//highlightAll
    
    function highlightOne( nodeOne, wordOne, wordTwo )
    {
      var oneIdx = nodeOne.data.indexOf( wordOne );
      var tempStr = nodeOne.data.substring( oneIdx + wordOne.length );
      var twoIdx = tempStr.indexOf( wordTwo );
    
      // only create the highlight if it's not too close
      if( twoIdx > dist )
      {
        var reg = new RegExp( wordOne );
        var start = nodeOne.parentNode.innerHTML.replace(  
          reg,  highStart + wordOne + highEnd 
        );
        nodeOne.parentNode.innerHTML = start;
      }//if the distance threshold exceeded
    }//highlightOne
    

    Similar to the removeIndicators function, highlightAll uses a regular expression to replace the text to be highlighted with markup, including the highlighting tags and the original text.

    Function highlightOne, used later in the twoWordSearch function, checks that the first word is sufficiently far away from the second word, then performs the same replacement. Word distance checks need to take place in the rendered text as returned from the XPath statement; otherwise, various markup, such as <b>, will affect the distance calculations. Listing 9 shows the twoWordSearch function in detail.


    Listing 9. twoWordSearch function
    function twoWordSearch( wordOne, wordTwo )
    {
      // use XPath to quickly extract all of the rendered text
      var textNodes = document.evaluate( '//text()', document, null,
                                         XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                                         null );
      var nodeOne;
      var foundSingleNode = 0;
    
      for (var i = 0; i < textNodes.snapshotLength; i++)
      {
        textNode = textNodes.snapshotItem(i);
    
        // if both words in the same node, highlight if not too close
        if( textNode.data.indexOf( wordOne ) != -1 &&
            textNode.data.indexOf( wordTwo ) != -1 )
        { 
          highlightOne( textNode, wordOne, wordTwo );
          foundSingleNode = 0;
          nodeOne = null;
        }else
        { 
          if( textNode.data.indexOf( wordOne ) != -1 )
          { 
            // if the first word is already found, highlight the entry
            if( foundSingleNode == 1  &&
                nodeOne.parentNode != null &&
                nodeOne.parentNode.innerHTML.indexOf( wordTwo ) == -1 )
            { 
              highlightAll( nodeOne, wordOne );
            }//if second word is in the same parent node
    
            // record current node found 
            nodeOne = textNode;
            foundSingleNode = 1;
    
          }//if text match
    
          if( textNode.data.indexOf( wordTwo ) != -1 ){ foundSingleNode = 0; }
    
        }//if both words in single node
    
      }//for each text node
    
      // no second word nearby, highlight all entries
      if( foundSingleNode == 1 ){ highlightAll( nodeOne, wordOne ); }
    
    }//twoWordSearch
    

    Walking through each text node as retrieved from the XPath call is done the same way as in the oneWordSearch function. If both words are found within the current text node, the highlightOne function is called to highlight the instances of wordOne where it is sufficiently distant from wordTwo.

    If both words are not in the same node, the foundSingleNode variable is set on the first match. On subsequent matches, the highlightAll function is called when the single node is detected again before a second node match. This ensures that each instance of the first word is highlighted — even those that do not have the second word nearby. Upon a loop, a final check is made to run highlightAll if the last wordOne match was isolated and still needs to be highlighted.

    Save the file created with the above code as greppishFind.user.js and read on for installation and usage details.



    Back to top


    Installing the greppishFind.user.js script

    Open your Firefox browser with the Greasemonkey V0.7 extension installed and enter the URL to the directory where greppishFind.user.js is located. Click on the greppishFind.user.js file and you should see the standard Greasemonkey install pop up. Select install, then reload the page to activate the extension.

    Usage examples

    Once the greppishFind.user.js script is installed into Greasemonkey, you can mimic the examples shown in Figure 1 by entering dom inspector as a search query at www.google.com. When the results page appears, press Alt+= to activate the user interface. Type the query DOM (case-sensitive) and press Enter to see all entries of DOM highlighted. Change the query to DOM hierarchy, and you'll see how only the first three entries of DOM are highlighted, as shown in Figure 1.

    Choose a directory listing such as file:///home/ or file:///c:/ to show entries like those listed in Figure 2. You may want to experiment with changes to the distance parameter or highlighting style to achieve results tailored to your searches.

    Conclusion, further additions

    With the code above and your completed greppishFind.user.js program, you now have a baseline for implementing your own text-search capabilities in Firefox. Although this program focuses on specific cases of certain words appearing in close proximity to others, it provides a framework for further text-searching options.

    Consider adding color changes for highlighted words based on how close the secondary terms are. Expand the number of grep -v words to eliminate entries gradually. Use the code here and your own ideas to create new Greasemonkey user scripts that further enhance users' abilities to find text.

    discuss this topic to forum

    relation tutorial

    No information

    Category

      Accessibility (13)
      Basics (22)
      Content (13)
      eCommerce (11)
      Miscellaneous (14)
      Site Optimization (17)
      Templates (7)

    New

    Hot