Xerces-C++ and libxml++ are not to bad but I have never met a parser that I love. The main reason I choice Xerces was the painless'ness of compiling and linking against the library; I really do not want to go through the bother of setting up libxml++ in MSVC. Especially when taking a look at the pkg-config output on my workstation:
FreeBSD$ pkg-config libxml++-2.6 --cflags --libs 18:07 -I/usr/local/include/libxml++-2.6 -I/usr/local/include/libxml++-2.6/include -I/usr/local/include/libxml2 -I/usr/local/include -I/usr/local/include/glibmm-2.4 -I/usr/local/lib/glibmm-2.4/include -I/usr/local/include/sigc++-2.0 -I/usr/local/lib/sigc++-2.0/include -I/usr/local/include/glib-2.0 -I/usr/local/lib/glib-2.0/include -L/usr/local/lib -lxml++-2.6 -lxml2 -lglibmm-2.4 -lgobject-2.0 -lsigc-2.0 -lglib-2.0
SAX, DOM, or whatever else, the parser style doesn't really matter to me that much: as long as it gets the job *done*. Although obviously, I am more familiar with DOMs (thank you JavaScript). I tend use XML for storing structured data without having to resort to a binary file/database, or a curmudgeon of files within a zip archive. So operations tend to be very straight forward using a couple of glue functions.
Personally, my idea of fun XML parsing is to take data this as input:
<rootnode> <child1 attr="val">string of text</child1> <child1> <child2>another string of text</child2> </child1> </rootnode>
and to in turn receive a nested data structure like this as output:
# example in Perl $structure = { node => 'rootnode', attributes => undef, data => [ { node => 'child1', attributes => { attr => 'val' }, data => 'string of text' }, { node => 'child1', attributes => undef, data => [ { node => 'child2', attributes => undef, data => 'another string of text' } ] } ] };
Probably because that is how my brain sees the preceding XML xD.
Not to mention it makes writing something like a pretty printer easy as pi:
# for some reason, writing this subroutine was very relaxing... sub pp_xml { my $xhr = shift; my $depth = shift; my $indent = sub { "\t" x shift }; my $node = $xhr->{node} or warn "XML node has no data!\n"; if ($xhr->{attributes}) { while (my ($attr, $val) = each %{$xhr->{attributes}}) { $node .= " " . $attr . "='" . $val . "'"; } } print $indent->($depth), '<', $node, '>', "\n"; $xhr = $xhr->{data}; if (ref $xhr eq 'ARRAY') { pp_xml($_, $depth+1) foreach @$xhr; } else { print $indent->($depth+1), $xhr, "\n"; } print $indent->($depth), '</', $node, '>', "\n"; } pp_xml($structure, 0);
Making it accept a callback ident function as a 3rd argument, is left as an exercise for others who are equally in need of R&R 8=).
Terry@dixie$ perl -Mstrict /tmp/xml.pl -Mwarnings 21:57 <rootnode> <child1 attr='val'> string of text </child1 attr='val'> <child1> <child2> another string of text </child2> </child1> </rootnode>
No comments:
Post a Comment