• home
  • forum
  • my
  • kt
  • download
  • XML::Simple Module

    Author: 2007-08-10 11:45:13 From:

    This chapter describes:

    • Introduction to XML::Simple module.
    • Example Perl programs to use XML::Simple options.
    • Example Perl program to modify the parsed XML hash.

    If you need to know more about XML, please read my other book: "Herong's Notes on XML Technologies".

    XML::Simple Methods

    XML::Simple module is an easy API to read and write XML files. It offers two main methods: XMLin() and XMLout().

    XMLin(str) - Method to parse the XML input into a hash, and return the reference of the hash. The XML input can be specified in 3 ways:

    • If the method is called with no parameter, the XML input is in script_name.xml, where script_name is the same name of the calling Perl script file.
    • If the method is called with a string parameter containing <tag>, the XML input is the string param.
    • If the method is called with a string parameter without any <tag>, the XML input is in the file with the string param as file name.

    XMLout(ref) - Method to write the hash pointed by the specified reference into an XML string, and return XML string.

    Here is a simple program to show you how to use XML:Simple:

    #- XmlSimpleHello.pl
    #- Copyright (c) 1999 by Dr. Herong Yang
    #
       use XML::Simple;
       my $xs = new XML::Simple();
       my $ref = $xs->XMLin("<p>Hello world!</p>");
       my $xml = $xs->XMLout($ref);
       print $xml;
       exit;
    

    Output:

    <opt>Hello world!</opt>
    

    It's interesting to see from the output that the <p> tag has been changed to <opt> during the read and write operations. This is because of the default setting of the option: keeproot. I will explain some of the important options later in this chapter.

    XML::Simple Options

    XML::Simple options can be specified in the new() method call:

       $xs = new XML::Simple(option1 => value, option2 => value, ...);
    

    Commonly used options are:

    1. keeproot => 1: Applies to both XMLin() and XMLout() to keep the root tag.

    2. searchpath => list: Applies to XMLin() to specifies the directories to search for input XML files.

    3. forcearray => 1: Applies to XMLin() to force the contents of all elements to be an array.

    4. suppressempty => 1 or '': Applies to XMLin() to skip empty elements or to represent them as '' strings. The default behavior is to represent empty elements as references of empty hashes. The default behavior makes it hard to access empty elements in the parsed hash.

    5. keyattr => list: Applies to XMLin() and XMLout() to name attributes, or sub-elements as keys to be used to promot the parent element from array to hash. Remember that there is default list: "name", "key", and "id".

    "forcearray" Example - XmlSimpleArray.pl

    The following program shows you how to use options, keeproot, searchpath, and forcearray:

    #- XmlSimpleArray.pl
    #- Copyright (c) 1999 by Dr. Herong Yang
    #
       use XML::Simple;
       use Data::Dumper;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".");
       my $ref = $xs->XMLin("user.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump without 'forcearray => 1':\n";
       print Dumper($ref);
       print "\nXML output without 'forcearray => 1':\n";
       print $xml;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".",
          forcearray => 1,);
       my $ref = $xs->XMLin("user.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump with 'forcearray => 1':\n";
       print Dumper($ref);
       print "\nXML output with 'forcearray => 1':\n";
       print $xml;
       exit;
    

    The input file, user.xml, has the following XML:

    <?xml version="1.0"?>
    <user status="active">
     <!-- This is not a real user. -->
     <first_name>Mike</first_name>
     <last_name>Lee</last_name>
    </user>
    

    Here is the output of the program:

    Hash dump without 'forcearray => 1':
    $VAR1 = {
              'user' => {
                          'first_name' => 'Mike',
                          'status' => 'active',
                          'last_name' => 'Lee'
                        }
            };
    
    XML output without 'forcearray => 1':
    <user first_name="Mike" status="active" last_name="Lee" />
    
    Hash dump with 'forcearray => 1':
    $VAR1 = {
              'user' => [
                          {
                            'first_name' => [
                                              'Mike'
                                            ],
                            'status' => 'active',
                            'last_name' => [
                                             'Lee'
                                           ]
                          }
                        ]
            };
    
    XML output with 'forcearray => 1':
    <user status="active">
      <first_name>Mike</first_name>
      <last_name>Lee</last_name>
    </user>
    

    A couple of the interesting things to note here:

    • Remember to specify the directory with "searchpath" where the XML input file is, even if it is the current directory.
    • The ?xml statement in the XML input was ignored.
    • "keeproot" keeped the root tag "user" correctly for us.
    • Without "forcearray => 1", child elements with only text contents were parsed as hash entries of keys and values, where the values are the text contents. But the attribute of the parent element was also parsed as a hash entry. So those child elements and the attribute can not be distinguished once parsed into the hash structure.
    • If you want to maintain the difference between attributes and child elements with only text contents, you should use the "forcearray => 1" option.
    • The order of hash entries did not match the order of elements in the XML input.

    "suppressempty" Example - XmlSimpleEmpty.pl

    The following program shows you how to use option, suppressempty:

    #- XmlSimpleEmpty.pl
    #- Copyright (c) 1999 by Dr. Herong Yang
    #
       use XML::Simple;
       use Data::Dumper;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".",
          forcearray => 1,);
       my $ref = $xs->XMLin("system.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump without suppressempty => '':\n";
       print Dumper($ref);
       print "\nXML output without suppressempty => '':\n";
       print $xml;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".",
          forcearray => 1, suppressempty => '');
       my $ref = $xs->XMLin("system.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump with suppressempty => '':\n";
       print Dumper($ref);
       print "\nXML output with suppressempty => '':\n";
       print $xml;
       exit;
    

    The input file, system.xml, has the following XML:

    <?xml version="1.0"?>
    <system>
     This is a testing system.
     <user status="active">
      <first_name>Mike</first_name>
      <last_name>Lee</last_name>
      <email>mike@lee.com</email>
     </user>
     <user>
      Missing first name and email.
      <first_name></first_name>
      <last_name>Wong</last_name>
      <email></email>
     </user>
     Needs to add more entries later.
    </system>
    
    Here is the output of the program: 
    Hash dump without suppressempty => '':
    $VAR1 = {
              'system' => [
                            {
                              'content' => [
                                             '
     This is a testing system.
     ',
                                             '
     Needs to add more entries later.
    '
                                           ],
                              'user' => [
                                          {
                                            'first_name' => [
                                                              'Mike'
                                                            ],
                                            'status' => 'active',
                                            'last_name' => [
                                                             'Lee'
                                                           ],
                                            'email' => [
                                                         'mike@lee.com'
                                                       ]
                                          },
                                          {
                                            'first_name' => [
                                                              {}
                                                            ],
                                            'last_name' => [
                                                             'Wong'
                                                           ],
                                            'content' => '
      Missing first name and email.
      ',
                                            'email' => [
                                                         {}
                                                       ]
                                          }
                                        ]
                            }
                          ]
            };
    
    XML output without suppressempty => '':
    <system>
      <content>
     This is a testing system.
     </content>
      <content>
     Needs to add more entries later.
    </content>
      <user status="active">
        <first_name>Mike</first_name>
        <last_name>Lee</last_name>
        <email>mike@lee.com</email>
      </user>
      <user>
      Missing first name and email.
      <first_name></first_name>
        <last_name>Wong</last_name>
        <email></email>
      </user>
    </system>
    
    Hash dump with suppressempty => '':
    $VAR1 = {
              'system' => [
                            {
                              'content' => [
                                             '
     This is a testing system.
     ',
                                             '
     Needs to add more entries later.
    '
                                           ],
                              'user' => [
                                          {
                                            'first_name' => [
                                                              'Mike'
                                                            ],
                                            'status' => 'active',
                                            'last_name' => [
                                                             'Lee'
                                                           ],
                                            'email' => [
                                                         'mike@lee.com'
                                                       ]
                                          },
                                          {
                                            'first_name' => [
                                                              ''
                                                            ],
                                            'last_name' => [
                                                             'Wong'
                                                           ],
                                            'content' => '
      Missing first name and email.
      ',
                                            'email' => [
                                                         ''
                                                       ]
                                          }
                                        ]
                            }
                          ]
            };
    
    XML output with suppressempty => '':
    <system>
      <content>
     This is a testing system.
     </content>
      <content>
     Needs to add more entries later.
    </content>
      <user status="active">
        <first_name>Mike</first_name>
        <last_name>Lee</last_name>
        <email>mike@lee.com</email>
      </user>
      <user>
      Missing first name and email.
      <first_name></first_name>
        <last_name>Wong</last_name>
        <email></email>
      </user>
    </system>
    
    A couple of the interesting things to note here: 
    • Text in mixed context was parsed into a hash entry with the key hard coded as "content".
    • Texts separated by child elements were parsed into a single hash entry with the value as a reference to an array of multiple entries.
    • Child elements with the same tag name were parsed into a single hash entry with the value as a reference to an array of multiple entries.
    • Child elements with the same tag name were grouped together, even if they were separated by other child elements in the XML input. This will change the order of child elements in the XML output.
    • Without "suppressempty => ''", empty elements were indeed parsed as empty hashes.
    • With "suppressempty => ''", empty elements were indeed parsed as empty strings.

    "keyattr" Example - XmlSimpleKey.pl

    keyattr => list: Applies to XMLin() and XMLout() to name attributes, or sub-elements as keys to be used to promot the parent element from array to hash. Remember that there is default list: "name", "key", and "id".

    The following program shows you how to use option, keyattr:

    #- XmlSimpleKey.pl
    #- Copyright (c) 1999 by Dr. Herong Yang
    #
       use XML::Simple;
       use Data::Dumper;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".",
          forcearray => 1); # default is: keyattr => [name, key, id])
       my $ref = $xs->XMLin("bank.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump with 'keyattr => [name, key, id]':\n";
       print Dumper($ref);
       print "\nXML output with 'keyattr => [name, key, id]':\n";
       print $xml;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".",
          forcearray => 1, keyattr => [key, tag]);
       my $ref = $xs->XMLin("bank.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump with 'keyattr => [key, tag]':\n";
       print Dumper($ref);
       print "\nXML output with 'keyattr => [key, tag]':\n";
       print $xml;
       exit;
    

    The input file, bank.xml, has the following XML:

    <?xml version="1.0"?>
    <bank>
     <account id="123-4567">
      <type>Checking</type>
      <balance>149.99</balance>
     </account>
     <client>
      <name>Mike Lee</name>
      <email>mike@lee.com</email>
     </client>
     <account>
      <id>333-4444</id>
      <type>Saving</type>
      <balance>941.99</balance>
     </account>
    </bank>
    

    Here is the output of the program:

    Hash dump with 'keyattr => [name, key, id]':
    $VAR1 = {
       'bank' => [
          {
             'account' => {
                'ARRAY(0x26426ec)' => {
                   'type' => [
                      'Saving'
                      ],
                   'balance' => [
                      '941.99'
                      ]
                   },
                '123-4567' => {
                   'type' => [
                      'Checking'
                      ],
                   'balance' => [
                      '149.99'
                      ]
                   }
                },
             'client' => {
                'ARRAY(0x2642680)' => {
                   'email' => [
                      'mike@lee.com'
                      ]
                   }
                }
             }
          ]
       };
    
    XML output with 'keyattr => [name, key, id]':
    <bank>
      <account name="ARRAY(0x26426ec)">
        <type>Saving</type>
        <balance>941.99</balance>
      </account>
      <account name="123-4567">
        <type>Checking</type>
        <balance>149.99</balance>
      </account>
      <client name="ARRAY(0x2642680)">
        <email>mike@lee.com</email>
      </client>
    </bank>
    
    Hash dump with 'keyattr => [key, tag]':
    $VAR1 = {
       'bank' => [
          {
             'account' => [
                 {
                   'id' => '123-4567',
                   'type' => [
                      'Checking'
                      ],
                   'balance' => [
                      '149.99'
                      ]
                   },
                {
                   'id' => [
                      '333-4444'
                      ],
                   'type' => [
                      'Saving'
                      ],
                   'balance' => [
                      '941.99'
                      ]
                   }
                ],
             'client' => [
                {
                   'email' => [
                      'mike@lee.com'
                      ],
                   'name' => [
                      'Mike Lee'
                      ]
                   }
                ]
             }
          ]
       };
    
    XML output with 'keyattr => [key, tag]':
    <bank>
      <account id="123-4567">
        <type>Checking</type>
        <balance>149.99</balance>
      </account>
      <account>
        <id>333-4444</id>
        <type>Saving</type>
        <balance>941.99</balance>
      </account>
      <client>
        <email>mike@lee.com</email>
        <name>Mike Lee</name>
      </client>
    </bank>
    

    Notes about "keyattr":

    • Be careful, there is a default setting for "keyattr": [name, key, id].
    • If an attribute is found in the key list, the it's name will be removed, and it's value will be converted into a key in the parent hash.
    • Note that we have 'ARRAY(0x26426ec)' in the out. The reason is that we have a sub-element with the name "name", which is listed in "keyattr". When a sub-element is found in the key list, it will be promoted to the attribute level. But this will cause a problem on the value part. The content of the sub element is converted into array first, then this array is used as the value of this promoted attribute. Since this promoted attribute is converted into a key in the paremt hash, the array converted from the content will be used as the key. This is why you see 'ARRAY(0x26426ec)'.

    Hash Modification Example - XmlSimpleHash.pl

    The following example shows you hot to modify the resulting hash of the parsing operation. The important thing to remember when accessing the contents of the hash is that everything is parsed as array or hash. Hashes hold the tag names and attributes, and arrays hold their content.

    #- XmlSimpleHash.pl
    #- Copyright (c) 1999 by Dr. Herong Yang
    #
       use XML::Simple;
       use Data::Dumper;
       my $xs = new XML::Simple(keeproot => 1,searchpath => ".",
          forcearray => 1, suppressempty => '');
       my $ref = $xs->XMLin("system.xml");
       my $xml = $xs->XMLout($ref);
       print "\nHash dump:\n";
       print Dumper($ref);
       print "\nXML output:\n";
       print $xml;
       $ref->{system}->[0]->{user}->[1]->{first_name}->[0] = "Bill";
       $ref->{system}->[0]->{user}->[1]->{email}->[0] = "bill\@wong.com";
       my $xml = $xs->XMLout($ref);
       print "\nUpdated XML output:\n";
       print $xml;
       exit;
    

    Output:

    Hash dump:
    $VAR1 = {
              'system' => [
                            {
                              'content' => [
                                             '
     This is a testing system.
     ',
                                             '
     Needs to add more entries later.
    '
                                           ],
                              'user' => [
                                          {
                                            'first_name' => [
                                                              'Mike'
                                                            ],
                                            'status' => 'active',
                                            'last_name' => [
                                                             'Lee'
                                                           ],
                                            'email' => [
                                                         'mike@lee.com'
                                                       ]
                                          },
                                          {
                                            'first_name' => [
                                                              ''
                                                            ],
                                            'last_name' => [
                                                             'Wong'
                                                           ],
                                            'content' => '
      Missing first name and email.
      ',
                                            'email' => [
                                                         ''
                                                       ]
                                          }
                                        ]
                            }
                          ]
            };
    
    XML output:
    <system>
      <content>
     This is a testing system.
     </content>
      <content>
     Needs to add more entries later.
    </content>
      <user status="active">
        <first_name>Mike</first_name>
        <last_name>Lee</last_name>
        <email>mike@lee.com</email>
      </user>
      <user>
      Missing first name and email.
      <first_name></first_name>
        <last_name>Wong</last_name>
        <email></email>
      </user>
    </system>
    
    Updated XML output:
    <system>
      <content>
     This is a testing system.
     </content>
      <content>
     Needs to add more entries later.
    </content>
      <user status="active">
        <first_name>Mike</first_name>
        <last_name>Lee</last_name>
        <email>mike@lee.com</email>
      </user>
      <user>
      Missing first name and email.
      <first_name>Bill</first_name>
        <last_name>Wong</last_name>
        <email>bill@wong.com</email>
      </user>
    </system>
    

    Conclusions:

    • XML::Simple is really simple to use.
    • Element tags are parsed into hash keys, with hash values pointing to arrays containing the elements contents.
    • Attribute names are parsed into hash keys, with hash values being the attribute value strings.
    • Hashes and arrays are used as references in the resulting hash.
    • The order of child elements might be changed in the resulting hash.
    • Child elements of the same tag are grouped in to array.

    discuss this topic to forum

    relation tutorial

    No relevant information

    New

    Hot