Monday, April 30, 2007

A quick Python recipe for a validating XML parser

ElementTree is now included in Python standard libraries from version 2.5 but as good as it is, it has no support for XMLschema validation and limited support for XPath. For that you need lxml which builds on the foundation of ElementTree.

Here's a few lines of Python to validate an XML document using a schema document using lxml:

from lxml import etree

# Parse the schema document
xsd = etree.ElementTree(file = 'schema.xsd')

# Build an XMLSchema object from the parsed document
xsv = etree.XMLSchema(xsd)

# Validate the document using the schema
doc = etree.ElementTree(file = 'doc.xml')
xsv.validate(doc)

And that's it!

If you also want to perform Xpath operations then here's a few examples:

# continuing from above

# Find all nodes with a tagname amount
nodes = doc.xpath('//amount)

# Find all nodes with a tagname amount and attribute value with value 7
nodes = doc.xpath('//amount[value=7])

# Need a namespace? Supply a dictionary
nodes = doc.xpath('//cdf:amount, {'cdf' : 'http://uri.namespace.org/1.0'})


Later: lxml uses libxml under the hood to do its magic. Apparently, there are some bugs. When trying to validate XCCDF documents, errors are generated. This forced me into actually using C++ to build a schema validator which was kind of useful, seeing as that's what I was supposed to be doing in the first place.

Sunday, April 29, 2007

What programming language are you?

Ok, a little bit geeky but I saw this on Steve Holden's blog and thought it amusing. Take a little quiz and find out which programming language you are most like. I resemble Perl (and Steve is Smalltalk).. let me know if you are Python.

You are Perl. People have a hard time understanding you, but you are always able to help them with almost all of their problems.
Which Programming Language are You?

First, pass the post

I thought I'd start a blog about my use Python and things related.. so here I am. Getting started proves to be difficult but I figure if I start typing and then redact, something might come out..

On an occasional basis (free time is more scarce since the arrival of baby Meghan), I attend the Dallas-Fort Worth pythoneers saturday sprints at Nerdbooks in Richardson.

Yesterday's meeting was covered a range of topics; a programming challenge; extending python via ctypes, pyrex, swig and boost; population simulations in Python; creating S5 presentations using restructured text; and the normal variety of odd conversation topics.