Posts Tagged urllib2
I needed a process to perform multiple web services calls and return the combined results. Efficiency was fairly important, so I needed an asynchronous solution. I had used
pycurl previously, but not in this fashion, so
CurlMulti was new to me. Now, I wouldn’t use
httplib will do, but this is just such a case. The reason I’m posting my code (modified to remove some inessential peculiarities) is that I had trouble finding a good example. The
pycurl docs only give a trivial example of
CurlMulti usage involving one handle (!) and no provision for marshaling response data. I briefly considered using
threading, but I’d rather leave thread management to the experts.
import pycurl from cStringIO import StringIO urls = [...] # list of urls # reqs: List of individual requests. # Each list element will be a 3-tuple of url (string), response string buffer # (cStringIO.StringIO), and request handle (pycurl.Curl object). reqs =  # Build multi-request object. m = pycurl.CurlMulti() for url in urls: response = StringIO() handle = pycurl.Curl() handle.setopt(pycurl.URL, url) handle.setopt(pycurl.WRITEFUNCTION, response.write) req = (url, response, handle) # Note that the handle must be added to the multi object # by reference to the req tuple (threading?). m.add_handle(req) reqs.append(req) # Perform multi-request. # This code copied from pycurl docs, modified to explicitly # set num_handles before the outer while loop. SELECT_TIMEOUT = 1.0 num_handles = len(reqs) while num_handles: ret = m.select(SELECT_TIMEOUT) if ret == -1: continue while 1: ret, num_handles = m.perform() if ret != pycurl.E_CALL_MULTI_PERFORM: break for req in reqs: # req.getvalue() contains response content ...
So, I’ve been slaving away at a web application that connects to its data store via RESTful web services. Because I have to use HTTP PUT and DELETE methods in addition to the usual GET and POST methods, I chose to use Python’s standard library module httplib (urllib and urllib2 only support GET and POST). Everything’s going along great until I get around to a couple of methods I’d been putting off that involve file uploading.
Now, I’m reasonably knowledgeable about HTTP, not an expert by any means, and I’m generally happy to stay out of the gory details as much as possible by using standard library modules. So I was a bit surprised–and dismayed–to discover that if I wanted to submit a multipart form (truly multipart, having both string params and a file) I had to construct the request body by hand.
Of course, constructing a multipart form isn’t terribly difficult, but that’s not the point. Constructing HTTP request headers isn’t difficult, but how many people do it by hand? A file upload form isn’t exactly an edge case, so it’s curious to me that it would not be handled by httplib at this late date (there is a feature request for Python 2.7).
In any case, I’ve decided to use PycURL for the multipart forms. I would be perfectly happy with that if only I didn’t have to translate C API docs into Python (the PycURL docs give you just enough hints that you can get by with some trial and error).