Timisoara.pm

Timişoara Perl Mongers

XML::Simple is good enough for simple tasks (i.e. config files), but if you want to get down to business you might prefer XML::LibXML

The distribution is both a XML parser (and validator) and a XML generator. However the roles are not clearly separated and the methods are not orthogonal. The docs are OK, but unwieldy – there are many classes, and the methods are listed in the synopsis instead of the TOC. This also means you have to grep / search in the browser to get to a method’s description. The method you’re looking for might be hidden in one of the many classes, between other methods that are somehow unrelated to your purpose. In this series I’ll try to point out and illustrate the methods for the most common tasks.

What’s what

In the XML::LibXML namespace we have:

  • ::Document – this will hold everything, help you to set XML version, encoding and compression, create nodes, set the root node and last, but not least validate and serialize your document into a string
  • ::Node – base class for XML nodes, you never create a node directly (you create elements mostly). It has, however many methods which you can use
  • ::Element – this class represents an XML element, inherits the methods from ::Node and is your workhorse for XML building

First Example

use XML::LibXML;

$version = '1.0'; # is there any other?
$encoding = 'UTF-8'; # cause it's the standard, that's why!
$doc = XML::LibXML::Document->new( $version, $encoding ); # my XML DOM

# Element creation
$x = $doc->createElement( 'x' ); # <x/> element
$y = $doc->createElement( 'y' ); # <y/> element

# Attributes
$x->setAttribute( 'lang', 'en' ); # <x lang="en" />
$y->setAttribute( 'baz', 1 ); # <y baz="1" />

# XML structure and text nodes
$x->appendChild( $y ); # <x><y/></x>
$y->appendTextNode( 'foo' ); # <y>foo<y/>
$x->appendTextChild( 'name', 'bar' ); # <x><name>bar</name></x>

$doc->setDocumentElement( $x ); # make <x/> the root element

print $doc->toString( 1 ); # serialize it

This is the result:



  foo
  bar

What happened?

The first step is to create a DOM document class (line 5). I like to think of it as a context object for all things XML. Then I create two nodes (lines 8, 9). You can create them using the DOM object or directly from their class (XML::LibXML::Element->new).

After that I add an attribute to each node (lines 12, 13). This is a short-cut method, it creates an attribute node with the given name and content and appends it to the node. You can also do this step by step, using the XML::LibXML::Attr class.

Then I arrange the nodes so that <y/> is the child node of <x/> (line 16). A text node is created and appended to <y/> with another short-cut method (line 17). I use another short-cut to create two nodes: a child element node (<name/>) and a text node as its child, and append them to <x/>.

The last step in building my XML is to declare <x/> as the document (or root) node, i.e. the top-most node in the document.

Then I serialize and print the whole thing out. The parameter passed to toString detemines how compact the serialization is:

  • 0 – no line breaks or indentation
  • 1 – line breaks and indentation for nested element nodes
  • 2 – line breaks for element and text nodes and indentation for nested element nodes (it’s like 1, but text nodes are printed on separate lines from element nodes)

One last note: the order of the steps does not matter. I could’ve added the attributes last and set the document node first thing, the end result would the same. Well, at least as long as you don’t lookup nodes before they are added. For example, you can’t refer to <x/> with the $doc->documentElement method before you add it as a document element.

Most of you are familiar with CPAN. It’s everything you wanted. It’s got recursive parser generators, ORMs, XML libraries, advanced OO systems, network servers and games.

You are probably also familiar with its command line client, which is also called cpan. It’s not nice. There’s a more advanced client, cpanp (cpan plus), but it’s not better either. Both are very nosy and pushy, they ask you all kind of intimate details, like where do you live, which mirror do like to gaze into, would you like french fries with your coffee and would you like extra flags for your compiler (no, I don’t and I guess I never will, thank you and would you please just install this module!). Then they make smalltalk to you:

Oooh, I’m fetching an index, look at me, now I’m compiling your module, and holy shit, here comes a dependency, look I’m compiling it too, ain’t I just great?.

But fear not, now there’s another alternative, cpanm (a.k.a. cpan minus). Minus stands for less talk, more work. More work means: zero configuration, automatic dependency resolution, local::lib support, and if you want it bad enough, even zero install.

Besides installing modules, it offers you a self upgrading command, installing just the dependencies without building the distribution or just downloading and unpacking the source.

All software sucks though, so while it works for 99% of the people, it still might fail on complex modules or apps. For me, it failed a Padre install, namely it choked on the Wx library from what was probably a wx toolkit version mismatch.

Still, in the spirit of Worse is Better, cpanm is my new CPAN client of choice.

Try it, and if you like it, go and vote for your favorite features.

Sometimes I can’t be bothered to install cpanm. Luckily there is such a thing as zero-install cpanm:

wget http://cpanmin.us/ -O - | perl - My::Module

This uses wget to fetch the latest cpanm source (from the easy to remember http://cpanmin.us URL redirect, or from the equivalent http://xrl.us/cpanm), spits it out to STDOUT, which in turned is piped to perl for execution (additional options and command line arguments must be placed after perl’s - argument).

If you don’t like wget’s progress bar and messages, just tell it to be --quiet:

wget http://cpanmin.us/ --quiet -O - | perl - My::Module

Be warned however that this is fetching the whole source (which currently weighs in at approx. 55 KB). Thus it’s kinda wasteful of network bandwidth if you use it constantly, so consider installing it (either for local::lib or for your whole system).

Don’t you hate it when you just type away at your command line, and you think “Who the hell uses this class called Lights::On and Lights::Out ?”, and you just reflexively type:

project$ grep -rE "use Lights::O" lib

and then it hits you like a brick wall of text:

lib/Bed/Time.pm:use Lights::Out;
lib/Bed/.svn/text-base/Time.pm.svn-base:use Lights::Out;
lib/Brownout.pm:use Lights::Out;
lib/Sinners/Sloth.pm:use Lights::On;
lib/Sinners/Lust.pm:use Lights::On;
lib/Sinners/Wrath.pm:use Lights::On;
lib/Sinners/Gluttony.pm:use Lights::On;
lib/Sinners/Greed.pm:use Lights::On;
lib/Sinners/.svn/text-base/Lust.pm.svn-base:use Lights::On;
lib/Sinners/.svn/text-base/Wrath.pm.svn-base:use Lights::On;
lib/Sinners/.svn/text-base/Pride.pm.svn-base:use Lights::On;
lib/Sinners/.svn/text-base/Envy.pm.svn-base:use Lights::On;
lib/Sinners/.svn/text-base/Sloth.pm.svn-base:use Lights::On;
lib/Sinners/.svn/text-base/Greed.pm.svn-base:use Lights::On;
lib/Sinners/.svn/text-base/Gluttony.pm.svn-base:use Lights::On;
lib/Sinners/Pride.pm:use Lights::On;
lib/Sinners/Envy.pm:use Lights::On;
lib/Blackout.pm:use Lights::Out;
lib/.svn/text-base/Brownout.pm.svn-base:use Lights::Out;
lib/.svn/text-base/Blackout.pm.svn-base:use Lights::Out;

Ugly as sin, ain’t it? Stupid SOB, that grep, can’t he see these are subversion files you have absolutely no interest in? Well, as a matter of fact, he’s so stupid, he can’t.

That’s why you, smart Perl Monger that you are, fall back to ack!

Because ack is better than grep! Yep, it’s so smart it almost hurts! Just try it:

project$ ack "use Lights::O"

No -r (we know you want -r every time), it skips those pesky svn base files as well as backup files, binary files, even core dumps. And for your convenience it will print line numbers and highlights the search terms for you (except when you pipe or redirect the output). Because it’s written in Perl, it comes with all the goodies of Perl regexes (no more -E or puzzling over extended vs. regular regexes and which metacharacters to escape).

There’s one common catch though. Unlike grep, ack will by default only search known filetypes, so if you’ve got some weird files (let’s say .wango files), ack wouldn’t know about and they will be silently skipped.

But don’t despair:

  • you can make it display a list of file types it knows about:
    ack --help=types
  • you can force it to search unknown file types too (subversion funky files will still be skipped):
    ack -a "use Lights::O"
  • even better, you can make “search unknown files too” your default:
    echo "--all" >> ~/.ackrc

There is a standalone version, which requires only Perl. Yes, it runs on Windows too (pure Perl), just install Strawberry Perl, cpan App::Ack and you’re good to go (provided you’ve got all your paths set up)

And it comes with a cat: ack --thpppt

Ce ziceţi ca odată pe lună (de ex. în prima vineri din fiecare lună) să ne întâlnim undeva în oraş?

Propuneri pentru dată resp. locaţie?

my $file = '/files/example.xml';
my $xml;
#use a local block to reset the input record separator.
{
  local $/=undef;
  open(XML, "$file") || die("Could not open $file");
  binmode XML;
  $xml = <XML>;
  close XML;
}
print $xml;

Padre logo

suus nomen est Padre. novus perl ide

Padre is a Perl IDE, an Integrated Development environment, or in other words a text editor that is simple to use for new Perl programmers but also supports large multi-lingual and multi-technology projects.

You can fetch it from here http://padre.perlide.org/download.html

Perl NEVER DIES !

It’s just pining for the fjords.