pycurl CurlMulti example

I needed a process to perform multiple web services calls and return the combined results. Efficiency was fairly important, so I needed an asynchronous solution. I had used pycurl previously, but not in this fashion, so CurlMulti was new to me. Now, I wouldn’t use pycurl where urllib/urllib2 or httplib will do, but this is just such a case. The reason I’m posting my code (modified to remove some inessential peculiarities) is that I had trouble finding a good example. The pycurl docs only give a trivial example of CurlMulti usage involving one handle (!) and no provision for marshaling response data. I briefly considered using urllib2 and threading, but I’d rather leave thread management to the experts.

import pycurl
from cStringIO import StringIO

urls = [...] # list of urls
# reqs: List of individual requests.
# Each list element will be a 3-tuple of url (string), response string buffer
# (cStringIO.StringIO), and request handle (pycurl.Curl object).
reqs = [] 

# Build multi-request object.
m = pycurl.CurlMulti()
for url in urls: 
    response = StringIO()
    handle = pycurl.Curl()
    handle.setopt(pycurl.URL, url)
    handle.setopt(pycurl.WRITEFUNCTION, response.write)
    req = (url, response, handle)
    # Note that the handle must be added to the multi object
    # by reference to the req tuple (threading?).
    m.add_handle(req[2])
    reqs.append(req)

# Perform multi-request.
# This code copied from pycurl docs, modified to explicitly
# set num_handles before the outer while loop.
SELECT_TIMEOUT = 1.0
num_handles = len(reqs)
while num_handles:
    ret = m.select(SELECT_TIMEOUT)
    if ret == -1:
        continue
    while 1:
        ret, num_handles = m.perform()
        if ret != pycurl.E_CALL_MULTI_PERFORM: 
            break

for req in reqs:
    # req[1].getvalue() contains response content
    ...
Advertisements

, , , ,

  1. #1 by squaremarket on December 30, 2011 - 11:54 pm

    Thank you! Like you said, the PyCurl documentation was abysmal so this was a lifesaver. Just one small typo – I believe the line “req = (sys_no, response, handle)” should be “req = (url, response, handle)”, or any other identifier you want to use for each url – sys_no isn’t defined anywhere in your code.

  2. #2 by David Chandek-Stark on December 31, 2011 - 11:05 am

    @squaremarket – Thanks for the correction! I had copied the code from a concrete usage, but wanted to make it more generic for this post. Glad you found it helpful!

  3. #3 by silent254 on January 29, 2013 - 9:04 pm

    Thanks for the example code. I was trying to do this in Asynchronous IO, and was running into some issues.

    If you are used to Python, then threading is a no brainer … I just followed the docs, and within an hour or so, I had a running multi-threaded app. I would highly encourage going to the docs and following examples from there. It’s just one more tool to have lying around.

    SS

  4. #4 by Jim on February 23, 2013 - 2:48 pm

    Your code says:
    # Note that the handle must be added to the multi object
    # by reference to the req tuple (threading?).
    I think that the reason you found this to be necessary is because m.add_handle(handle) doesn’t increase the refcount of handle, so handle would get cleaned up immediately after it goes out of scope in the for loop, causing problems. The comment should really be:
    # Note that we have to explicitly keep a reference to handle,
    # because adding it to the multi object doesn’t increase its reference count.