Archive for category Python

Export Plone 3 folder to zip file


To add functionality to a Plone 3 site to enable a user to download the contents of a folder and all subfolders as a .zip file.

Environment in which this tool was developed

Plone 3.3.1
Zope 2.10.9-final
Python 2.4.6

Important Notes

Excluded Content

The following types of content are excluded from the .zip file:

  • Links
  • Events
  • Empty Folders

Permissions and workflow state

The .zip file created by this tool includes only files and folders.  Permissions and workflow states on folders or content items are not retained in any way.  Permissions, however, are not bypassed — i.e., only content objects on which the current user has the View permission are included.

HTML Documents: Pages and News Items

“Pages” (a.k.a. “Documents”) and “News Items” are processed in the following way:

  • If the .html file extension is missing, it is added
  • A complete, yet simple, HTML document is created for the content — i.e., the Plone “wrapper” is removed.
  • The body of the document consists of the “cooked” document content, with the addition of an H1 element at the top containing the document title.
    If the document has a description, it is inserted in a paragraph element below the title.
  • The document creator and last modified date are added below the document content.

The Pieces

External module

The easiest way to implement is to create an “old-style” Zope product, which is simply a Python package, and put it in the products directory.  In my case, that directory is /opt/plone/zeoserver/products.  You may want to put an empty file called refresh.txt in the root of the package — this can help in development to avoid having to restart Zope clients to pick up changes — although you do still have to re-save the External Method that references the module, and I found that restarting the clients was often required anyway.  In the Python package, create a subdirectory, not a subpackage, called Extensions, and create your code module there.

For purposes of this article, I’ll call the product package is ExportTool, and the module

Note: To use the code as-is, you’ll need to create /opt/plone/temp and make it writeable by the user Unix under which the Plone clients run.  Alternatively, you could assign the TEMPDIR global variable in the module to tempfile.gettempdir().


import cgi
import os
import shutil
import subprocess
import tempfile

TEMPDIR = '/opt/plone/temp'

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<html xmlns="" xml:lang="en" lang="en">
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
      Created by: %(creator)s<br/>
      Last modified: %(modified)s

def export_folder(context, response):

    transform_tool = context.portal_transforms
    # temp dir for this export job
    tempdir = tempfile.mkdtemp(dir=TEMPDIR)

    def _export(folder, tempdir):
        # create dir into which folder contents will be exported as files
        folder_path = folder.getPhysicalPath()[1:]
        export_dir = os.path.join(tempdir, os.path.join(*folder_path))
        for obj in folder.getFolderContents(full_objects=True):
            if obj.portal_type == 'Folder':
                # recursive call
                _export(obj, tempdir)
            elif obj.portal_type in ['Image', 'File', 'Document', 'News Item']:
                filename = obj.getId()
                if obj.portal_type in ['Image', 'File']:
                    content =
                    if not filename.endswith('.html'):
                        filename += '.html'
                    body = obj.CookedBody()
                    description = obj.Description()
                    if description:
                        body = '<p>%s</p>\n%s' % (cgi.escape(description, quote=True), body)
                    content = DOC_TEMPLATE % {
                        'body': body,
                        'title': cgi.escape(obj.Title(), quote=True),                        
                        'modified': context.toLocalizedTime(obj.ModificationDate(), long_format=1),
                        'creator': obj.Creator(),
                outfile = open(os.path.join(export_dir, filename), 'wb')

        # export the content
        _export(context, tempdir)
        # create a zip file            
        zipprefix = '-'.join(context.getPhysicalPath()[1:])['/usr/bin/zip', '-r', zipprefix, 'intranet'])
        zipname = zipprefix + '.zip'
        zipped = os.path.join(tempdir, zipname)
        # set response headers
        response.setHeader('Content-Type', 'application/zip')
        response.setHeader('Content-Disposition', 'attachment; filename=%s' % zipname)
        # write zip file to response
        z = open(zipped, 'rb')
        while 1:
            chunk =
            if chunk:
        # delete the temp dir

Zope External Method

id: export_folder_to_zip
Module Name: ExportTool.export
Function Name: export_folder

Zope Script (Python)

id: exportFolderToZip


if context.portal_type == 'Folder' and len(context.getPhysicalPath()) > 3:
    container.export_folder_to_zip(context, container.REQUEST.RESPONSE)

Note: The guard limiting path length is to prevent downloading top-level folders (path element 1 is a slash, 2 is the Plone site, 3 is the first folder).

Add Action to Folder portal type

Title: Zip
Id: zip
URL (Expression): string:${folder_url}/exportFodlerToZip
Condition (Expression): (blank)
Permission: Modify portal content
Category: Object
Visible? (check)

Leave a comment

WSGI daemon mode solves python-ldap connection issues

I was porting a Django/python-ldap application to another server and getting sporadic ldap.SERVER_DOWN errors.  Some basic troubleshooting showed that the problem was occurring specifically when requests routed through mod_wsgi.  If I just ran the Python code — no problem; just the Django app via the dev server — fine; Apache straight to Django (proxying to dev server) — OK.  Now, while I had been planning to investigate mod_wsgi’s “daemon mode” for some time, I was still running all my apps in “embedded mode”.  But with this python-ldap problem, I had to dig deeper into the docs.  The mod_wsgi ApplicationIssues page discusses a number of problems related to C extension modules, and while not specifically mentioning python-ldap, it does make this generalization:

Because of the possibilty that extension module writers have not written their code to take into consideration it being used from multiple sub interpreters, the safest approach is to force all WSGI applications to run within the same application group, with that preferably being the first interpreter instance created by Python.

To force a specific WSGI application to be run within the very first Python sub interpreter created when Python is initialised, the WSGIApplicationGroup directive should be used and the group set to ‘%{GLOBAL}’.

WSGIApplicationGroup %{GLOBAL}

If it is not feasible to force all WSGI applications to run in the same interpreter, then daemon mode of mod_wsgi should be used to assign different WSGI applications to their own daemon processes. Each would then be made to run in the first Python sub interpreter instance within their respective processes.

I did try WSGIApplicationGroup %{GLOBAL} first, but (assuming I implemented it correctly), the problem remained. So I tried WSGI daemon mode and the process has proved stable.

, ,

Leave a comment

A detached user object for Django

Django’s authentication framework (django.contrib.auth) is both pluggable and stackable, which makes integrating custom authentication requirements into Django pretty smooth and easy in most cases.  But I had an edge case: the user ids of the backend system could not be guaranteed to conform to Django’s restriction on user names (even with the recent expansion to accommodate email addresses).  Thus the usual pattern of creating a Django user corresponding to each backend account required a workaround.  At first I tried a hack in which user id from the backend were base64-encoded and munged to conform to Django’s user name limits.  Communication with the backend then required de-munging and decoding the Django user name, etc.  While this seemed to work it was ugly as hell, and in any case, I was using neither the Django admin site nor Django permissions, so I didn’t need a real Django User model instance for that.  On the other hand, I did want to keep the goodness of my pluggable authentication backend and the login view from django.contrib.auth.views, both of which expect User-like objects.

So, I decided to try something kinda crazy: subclassing django.contrib.auth.models.AnonymousUser  and overriding the “anonymity”.  AnonymousUser really takes advantage of Python duck-typing by mimicking the django.contrib.auth.models.User without being a Django model itself, and so isn’t tied to Django’s database.  Here’s what I came up with:

from django.contrib.auth.models import AnonymousUser

class DetachedUser(AnonymousUser):
    Implements a user-like object for user authentication
    when linkage of a backend user identity to a Django user
    is undesirable or unnecessary.

    # mark as not hashable -- see also __eq__() and __ne__(), below
    __hash__ = None 

    # is_active might matter in some contexts, so override
    is_active = True 

    # AnonymousUser sets username to '' and id to None, so we need to at least
    # override those values.
    def __init__(self, username):
        self.username = username = username

    def __unicode__(self):
        return self.username

    # is_anonymous() and is_authenticated() are key to distinguishing truly 
    # anonymous/unauthenticated users from known/authenticated ones.
    def is_anonymous(self):
        return False

    def is_authenticated(self):
        return True

    # __eq__ and __ne__ are related to hashing, so be consistent with __hash__, above.
    def __eq__(self, other):
        return NotImplemented

    def __ne__(self, other):
        return NotImplemented

    # Some django.contrib.auth code may call this method,
    # e.g, to update the last login time
    def save(self):

Now I can code the get_user and authenticate methods of my custom authentication backend to return DetachedUser objects instead of Django users. So far, so good.

Leave a comment

Django-Apache-WSGI reverse proxy and load balancing

Serving Django apps behind a reverse proxy is really pretty straightforward once you’ve set it up, but you might run into a few snags along the way, depending on your requirements.  Load-balancing only adds a little more complexity.  Here’s how I’ve done it.

Example Architecture

  • Front end web server ( Apache 2.2 + mod_proxy, mod_proxy_balancer, mod_ssl.
  • Back end application servers (, Apache 2.2 + mod_wsgi, mod_ssl; Python 2.6; Django 1.3.1.
  • Backend database server.
  • Additional requirements: Remote user authentication; SSL and non-SSL proxies.

Let’s start with the application servers and deal with the front end later.

Application Servers

Obviously both app servers will be configured the same way.  How to keep them in sync will be discussed briefly.

Django Settings Module

In order for Django to properly create fully-qualified URLs for the front-end client, you must set:


This setting, new in Django 1.3.1, affects the get_host() and build_absolute_uri() methods of django.http.HttpRequest.  If not set, Django will use the value of the HTTP_HOST or SERVER_NAME variables, which are most likely set to the host name of the app server, not the front end.

If you’re using Django’s RemoteUserMiddleware and RemoteUserBackend for authentication, you will need to replace RemoteUserMiddleware with a custom subclass:

from django.contrib.auth.middleware import RemoteUserMiddleware

class ProxyRemoteUserMiddleware(RemoteUserMiddleware):
    header = 'HTTP_REMOTE_USER'

Then update your settings:


(It is possible to avoid this by setting REMOTE_USER on the app web server to the value of HTTP_REMOTE_USER, but here I will assume a default setup.)

If you’re using Django’s “sites” framework, you will probably want to set SITE_ID to correspond to the front-end site.  And if your WSGIScriptAlias path differs from the proxied path on the front-end server (not covered in detail here), you may have to use FORCE_SCRIPT_NAME (check the docs).

Django Application Modules and Templates

If your code or templates contain references to REMOTE_ADDR, REMOTE_USER or other server variables (via HttpRequest.META) affected by proxies, you will probably have to change them.  If you’re using Django’s RemoteUserMiddleware or the ProxyRemoteUserMiddleware subclass shown above, you should probably code with request.user.username instead of request.META['REMOTE_USER']; otherwise, you’ll want to reference HTTP_REMOTE_USER.  REMOTE_ADDR will be set to the IP address of the app server, not the proxy front-end; instead you will have to use HTTP_X_FORWARDED_FOR, which can have multiple comma-separated values.

Django Projects and Python Environments

Since we’ve got two app servers, each will have its own Python environment (created with virtualenv) and Django project.  In my setup I decided to serve the Django MEDIA_ROOT from network storage mounted at the same point on each server to avoid synchronization issues.  Otherwise, it seems OK to keep each instance separate (YMMV).  I use Fabric for ensuring that the Python environments and Django projects stay in sync across the two servers.  The precise way you do this syncing depends on your preferences, the available tools, etc.

Apache Configuration

The Apache config on each app server follows the normal Django/WSGI pattern, so I’ll skip the details here.  Note that while it is possible for WSGIScriptAlias path on the app server to differ from the proxied path on the front-end web server (which we’ll get to), this introduces some additional complexities which we will avoid here.  Some issues can be handled on the reverse proxy (front-end) server by Apache directives such as ProxyPassReverse and ProxyPassReverseCookiePath, but you may also need to use Django’s FORCE_SCRIPT_PATH setting in your project settings module.

Front-end Server

At this point you should have working Django projects on each app server under both SSL and non-SSL virtual hosts.  Now we’re going to set up the reverse proxy and load balancing on the front-end server.

Let’s assume your apps are served under the path /webapps on both port 80 and port 443 (SSL) virtual hosts.

Then, you can add to your port 80 virtual host:

<Proxy balancer://django-http>
    BalancerMember route=http-1
    BalancerMember route=http-2

<Location /webapps>
    ProxyPass balancer://django-http stickysession=sessionid

And to your SSL virtual host on port 443:

<Proxy balancer://django-https>
    BalancerMember route=https-1
    BalancerMember route=https-2

<Location /webapps>
    ProxyPass balancer://django-https stickysession=sessionid

This isn’t the only way to do it, of course, and you may have different requirements, but I’ve tried to cover the basics.

1 Comment

SVN 1.7 breaks Python packaging

Forget my problems with Emacs and SVN 1.7.  A much worse problem is that my Python package distributions are breaking because neither setuptools nor distribute groks the new working copy format introduced in SVN 1.7, and so not all files which are actually under version control get included in the output of dist commands (without using an explicit manifest).  So, I’m forced to downgrade to SVN 1.6, which is alright by me.  First I had to uninstall from Cygwin SVN and all packages that depend on it.  Then I installed the SlikSVN Win64 distribution of SVN 1.6.  And, finally, I had to trash all my SVN 1.7 working copies (no way to downgrade those) and re-checkout everything.

Leave a comment

My favorite Python packages

Python generally lives up to its motto, “Batteries included.”  Here I want to give credit to folks who have provided some of my extra “batteries” — freely available Python tools that make my work easier and better.

Django — The de facto standard for Python web application development.  I’ve learned a lot from studying its code.  Includes a library of useful utilities (django.utils) that can used outside of web application contexts (e.g., check out django.utils.datastructures.SortedDict).

sphinx — Also a de facto standard in the Python universe.  It’s made me appreciate reStructured Text and improve my code documentation practices.  Ironically I find its own documentation rather hard to use.

virtualenv — How did we manage without it?

pip — Better package management than easy_install.

ipython — Worth it for the command history alone.

decorator — Almost essential for writing decorators, especially if you’re on Python < 2.5.

Fabric — A great addition to the developer’s or sysadmin’s toolkit.

lxml — For XML processing, I almost never use Python’s builtin XML libraries.

pycurl — Brings the power of libcurl into Python, filling gaps left by urllib/urllib2 and httplib (e.g., multiple asynchronous requests, multipart form data).

xlrd, xlwt — Good API for MS Excel processing.  Unfortunately, no support (yet) for Excel 2007 XML format.

simplejson — The standard JSON library for Python < 2.6.

py.test — Anything that makes writing and running unit tests easier is very good.

unittest2 — Makes available to Python 2.4-2.6 the significant enhancements made to the standard unittest module in Python 2.7.

I also want to thank Christof Gohlke for his “Unofficial Windows Binaries” site, since up-to-date versions of lxml and pycurl would be difficult to use on Windows without his builds.

And finally, there are essential libraries that I depend on without normally using directly: MySQL-Python, pysqlite (stuck on CentOS 5/Python 2.4), python-ldap, docutils, setuptools.

Leave a comment

pycurl CurlMulti example

I needed a process to perform multiple web services calls and return the combined results. Efficiency was fairly important, so I needed an asynchronous solution. I had used pycurl previously, but not in this fashion, so CurlMulti was new to me. Now, I wouldn’t use pycurl where urllib/urllib2 or httplib will do, but this is just such a case. The reason I’m posting my code (modified to remove some inessential peculiarities) is that I had trouble finding a good example. The pycurl docs only give a trivial example of CurlMulti usage involving one handle (!) and no provision for marshaling response data. I briefly considered using urllib2 and threading, but I’d rather leave thread management to the experts.

import pycurl
from cStringIO import StringIO

urls = [...] # list of urls
# reqs: List of individual requests.
# Each list element will be a 3-tuple of url (string), response string buffer
# (cStringIO.StringIO), and request handle (pycurl.Curl object).
reqs = [] 

# Build multi-request object.
m = pycurl.CurlMulti()
for url in urls: 
    response = StringIO()
    handle = pycurl.Curl()
    handle.setopt(pycurl.URL, url)
    handle.setopt(pycurl.WRITEFUNCTION, response.write)
    req = (url, response, handle)
    # Note that the handle must be added to the multi object
    # by reference to the req tuple (threading?).

# Perform multi-request.
# This code copied from pycurl docs, modified to explicitly
# set num_handles before the outer while loop.
num_handles = len(reqs)
while num_handles:
    ret =
    if ret == -1:
    while 1:
        ret, num_handles = m.perform()
        if ret != pycurl.E_CALL_MULTI_PERFORM: 

for req in reqs:
    # req[1].getvalue() contains response content

, , , ,