Archive for category XML
Back in August 2009, I promised to tell you more about my experience using Django for a web application in front of a web services interface to the backend data store. Now that the code for the Trident Project has been released, I can be more specific and point you to the code if you’d like to explore it further (yes, yes, I’m behind on documentation).
Initially I tried to use Django models and managers because I think the APIs are elegant, and of course there’s the DRY principle. I knew I wanted an object API — no way was the web app going to deal with raw XML. Django 1.1’s “unmanaged models” opened the door, but the deeper I went down the rabbit hole, the more I came to feel that I would have to bend the API way out of shape, if it was even possible. Ultimately, Django’s API is too tightly coupled to SQL backends (I’m not up on Google AppEngine and django-nonrel).
So, ultimately I broke it down this way. There are three layers in the client code:
- A “middleware” layer that handles the basic HTTP request/response cycle with the RESTful web services. At this layer I have used httplib and pycurl.
- An object layer (which I call “entities” because they model the backend objects, which are referred to as entities). This layer handles calls to the middleware and marshalling the response data, and applying some lazy techniques. This layer is not coupled with Django and can be used on its own — very conveniently, for example from the Python interactive interpreter — or underneath another web framework.
- The Django web application layer which deals with the backend system exclusively through the object layer.
This is a work in progress, and needs a lot of refinement, but I’m pretty happy with how it functions by keeping the those three distinct concerns cleanly separated.
I’d love to hear how others may be using Django in similar ways.
I’m posting this recipe here because I had a hard time finding the information (eventually found in an lxml-dev thread).
So, you have an XML document which you’ve parsed with lxml.etree. The doc has an xml-stylesheet processing instruction, and you want to apply the stylesheet to the doc.
>>> from lxml import etree >>> doc = etree.parse('C:/Temp/report.xml') >>> docroot = doc.getroot() >>> pi = docroot.getprevious() >>> if isinstance(pi, etree._XSLTProcessingInstruction): xslt = pi.parseXSL() >>> transform = etree.XSLT(xslt) >>> result_tree = transform(doc)
These commands from the Python interactive interpreter just demonstrate the necessary steps. Obviously, there are other considerations to factor into actual code, such as security and exception handling. Also, we are assuming here that the XSLT PI, if present, occurs immediately before the root node of the document.
Working with XML isn’t my favorite task. Sure, I get why it’s useful for data transfer, but I’d rather deal with an interface that hides the gory details … meaning the raw XML. Of course, sometimes that isn’t possible. The Python standard library modules as of version 2.4 only offers the DOM and SAX APIs, which, while useful, just don’t offer enough power and flexibility to do ad hoc XML processing without significant pain, at least for the casual user. Python 2.5 adds the ElementTree API (xml.etree.ElementTree), but we’re still missing full XPath and XSLT support.
In steps lxml, a library that really makes everything else obsolete. Prior to lxml there was 4Suite, but its latest release was posted in December 2006 and the project appears to be dead. Fortunately, lxml runs on Python 2.3 or later (I’m stuck on 2.4 for the time being). The only hiccup, depending on your environment, can be getting the C dependencies installed – libxml and libxslt. But if you’re going to do some serious work with XML, it’s worth the effort to get lxml installed. You won’t want to go back.
>>> xmlsrc = '<root><element>text</element></root>' >>> from xml.dom.minidom import parseString >>> parseString(xmlsrc).getElementsByTagName('element').firstChild.nodeValue u'text'
>>> xml = '<root><element>text</element></root>' >>> from lxml import etree >>> etree.fromstring(xml).findtext('//element') 'text'
Ahh … better.