Posts Tagged pycurl

pycurl CurlMulti example

I needed a process to perform multiple web services calls and return the combined results. Efficiency was fairly important, so I needed an asynchronous solution. I had used pycurl previously, but not in this fashion, so CurlMulti was new to me. Now, I wouldn’t use pycurl where urllib/urllib2 or httplib will do, but this is just such a case. The reason I’m posting my code (modified to remove some inessential peculiarities) is that I had trouble finding a good example. The pycurl docs only give a trivial example of CurlMulti usage involving one handle (!) and no provision for marshaling response data. I briefly considered using urllib2 and threading, but I’d rather leave thread management to the experts.

import pycurl
from cStringIO import StringIO

urls = [...] # list of urls
# reqs: List of individual requests.
# Each list element will be a 3-tuple of url (string), response string buffer
# (cStringIO.StringIO), and request handle (pycurl.Curl object).
reqs = [] 

# Build multi-request object.
m = pycurl.CurlMulti()
for url in urls: 
    response = StringIO()
    handle = pycurl.Curl()
    handle.setopt(pycurl.URL, url)
    handle.setopt(pycurl.WRITEFUNCTION, response.write)
    req = (url, response, handle)
    # Note that the handle must be added to the multi object
    # by reference to the req tuple (threading?).
    m.add_handle(req[2])
    reqs.append(req)

# Perform multi-request.
# This code copied from pycurl docs, modified to explicitly
# set num_handles before the outer while loop.
SELECT_TIMEOUT = 1.0
num_handles = len(reqs)
while num_handles:
    ret = m.select(SELECT_TIMEOUT)
    if ret == -1:
        continue
    while 1:
        ret, num_handles = m.perform()
        if ret != pycurl.E_CALL_MULTI_PERFORM: 
            break

for req in reqs:
    # req[1].getvalue() contains response content
    ...
Advertisements

, , , ,

4 Comments

Django and Web Services, part 2

Back in August 2009, I promised to tell you more about my experience using Django for a web application in front of a web services interface to the backend data store.  Now that the code for the Trident Project has been released, I can be more specific and point you to the code if you’d like to explore it further (yes, yes, I’m behind on documentation).

Initially I tried to use Django models and managers because I think the APIs are elegant, and of course there’s the DRY principle.  I knew I wanted an object API — no way was the web app going to deal with raw XML.  Django 1.1’s “unmanaged models” opened the door, but the deeper I went down the rabbit hole, the more I came to feel that I would have to bend the API way out of shape, if it was even possible.  Ultimately, Django’s API is too tightly coupled to SQL backends  (I’m not up on Google AppEngine and django-nonrel).

So, ultimately I broke it down this way.  There are three layers in the client code:

  1. A “middleware” layer that handles the basic HTTP request/response cycle with the RESTful web services.  At this layer I have used httplib and pycurl.
  2. An object layer (which I call “entities” because they model the backend objects, which are referred to as entities).  This layer handles calls to the middleware and marshalling the response data, and applying some lazy techniques.  This layer is not coupled with Django and can be used on its own — very conveniently, for example from the Python interactive interpreter — or underneath another web framework.
  3. The Django web application layer which deals with the backend system exclusively through the object layer.

This is a work in progress, and needs a lot of refinement, but I’m pretty happy with how it functions by keeping the those three distinct concerns cleanly separated.

I’d love to hear how others may be using Django in similar ways.

, , ,

Leave a comment

Python and multipart/form-data: Where’s the love?

So, I’ve been slaving away at a web application that connects to its data store via RESTful web services.   Because I have to use HTTP PUT and DELETE methods in addition to the usual GET and POST methods, I chose to use Python’s standard library module httplib (urllib and urllib2 only support GET and POST).  Everything’s going along great until I get around to a couple of methods I’d been putting off that involve file uploading.

Now, I’m reasonably knowledgeable about HTTP, not an expert by any means, and I’m generally happy to stay out of the gory details as much as possible by using standard library modules.  So I was a bit surprised–and dismayed–to discover that if I wanted to submit a multipart form (truly multipart, having both string params and a file) I had to construct the request body by hand.

Of course, constructing a multipart form isn’t terribly difficult, but that’s not the point.  Constructing HTTP request headers isn’t difficult, but how many people do it by hand?  A file upload form isn’t exactly an edge case, so it’s curious to me that it would not be handled by httplib at this late date (there is a feature request for Python 2.7).

In any case, I’ve decided to use PycURL for the multipart forms.  I would be perfectly happy with that if only I didn’t have to translate C API docs into Python (the PycURL docs give you just enough hints that you can get by with some trial and error).

, , , ,

1 Comment