Archive for category Django

A detached user object for Django

Django’s authentication framework (django.contrib.auth) is both pluggable and stackable, which makes integrating custom authentication requirements into Django pretty smooth and easy in most cases.  But I had an edge case: the user ids of the backend system could not be guaranteed to conform to Django’s restriction on user names (even with the recent expansion to accommodate email addresses).  Thus the usual pattern of creating a Django user corresponding to each backend account required a workaround.  At first I tried a hack in which user id from the backend were base64-encoded and munged to conform to Django’s user name limits.  Communication with the backend then required de-munging and decoding the Django user name, etc.  While this seemed to work it was ugly as hell, and in any case, I was using neither the Django admin site nor Django permissions, so I didn’t need a real Django User model instance for that.  On the other hand, I did want to keep the goodness of my pluggable authentication backend and the login view from django.contrib.auth.views, both of which expect User-like objects.

So, I decided to try something kinda crazy: subclassing django.contrib.auth.models.AnonymousUser  and overriding the “anonymity”.  AnonymousUser really takes advantage of Python duck-typing by mimicking the django.contrib.auth.models.User without being a Django model itself, and so isn’t tied to Django’s database.  Here’s what I came up with:

from django.contrib.auth.models import AnonymousUser

class DetachedUser(AnonymousUser):
    """
    Implements a user-like object for user authentication
    when linkage of a backend user identity to a Django user
    is undesirable or unnecessary.
    """

    # mark as not hashable -- see also __eq__() and __ne__(), below
    __hash__ = None 

    # is_active might matter in some contexts, so override
    is_active = True 

    #
    # AnonymousUser sets username to '' and id to None, so we need to at least
    # override those values.
    #
    def __init__(self, username):
        self.username = username
        self.id = username

    def __unicode__(self):
        return self.username

    #
    # is_anonymous() and is_authenticated() are key to distinguishing truly 
    # anonymous/unauthenticated users from known/authenticated ones.
    #
    def is_anonymous(self):
        return False

    def is_authenticated(self):
        return True

    #
    # __eq__ and __ne__ are related to hashing, so be consistent with __hash__, above.
    #
    def __eq__(self, other):
        return NotImplemented

    def __ne__(self, other):
        return NotImplemented

    #
    # Some django.contrib.auth code may call this method,
    # e.g, to update the last login time
    #
    def save(self):
        pass

Now I can code the get_user and authenticate methods of my custom authentication backend to return DetachedUser objects instead of Django users. So far, so good.

Advertisements

Leave a comment

Django-Apache-WSGI reverse proxy and load balancing

Serving Django apps behind a reverse proxy is really pretty straightforward once you’ve set it up, but you might run into a few snags along the way, depending on your requirements.  Load-balancing only adds a little more complexity.  Here’s how I’ve done it.

Example Architecture

  • Front end web server (www.example.com): Apache 2.2 + mod_proxy, mod_proxy_balancer, mod_ssl.
  • Back end application servers (apps-01.example.com, apps-02.example.com): Apache 2.2 + mod_wsgi, mod_ssl; Python 2.6; Django 1.3.1.
  • Backend database server.
  • Additional requirements: Remote user authentication; SSL and non-SSL proxies.

Let’s start with the application servers and deal with the front end later.

Application Servers

Obviously both app servers will be configured the same way.  How to keep them in sync will be discussed briefly.

Django Settings Module

In order for Django to properly create fully-qualified URLs for the front-end client, you must set:

USE_X_FORWARDED_HOST = True

This setting, new in Django 1.3.1, affects the get_host() and build_absolute_uri() methods of django.http.HttpRequest.  If not set, Django will use the value of the HTTP_HOST or SERVER_NAME variables, which are most likely set to the host name of the app server, not the front end.

If you’re using Django’s RemoteUserMiddleware and RemoteUserBackend for authentication, you will need to replace RemoteUserMiddleware with a custom subclass:

from django.contrib.auth.middleware import RemoteUserMiddleware

class ProxyRemoteUserMiddleware(RemoteUserMiddleware):
    header = 'HTTP_REMOTE_USER'

Then update your settings:

MIDDLEWARE_CLASSES = (
    'path.to.ProxyRemoteUserMiddleware',
    )

(It is possible to avoid this by setting REMOTE_USER on the app web server to the value of HTTP_REMOTE_USER, but here I will assume a default setup.)

If you’re using Django’s “sites” framework, you will probably want to set SITE_ID to correspond to the front-end site.  And if your WSGIScriptAlias path differs from the proxied path on the front-end server (not covered in detail here), you may have to use FORCE_SCRIPT_NAME (check the docs).

Django Application Modules and Templates

If your code or templates contain references to REMOTE_ADDR, REMOTE_USER or other server variables (via HttpRequest.META) affected by proxies, you will probably have to change them.  If you’re using Django’s RemoteUserMiddleware or the ProxyRemoteUserMiddleware subclass shown above, you should probably code with request.user.username instead of request.META['REMOTE_USER']; otherwise, you’ll want to reference HTTP_REMOTE_USER.  REMOTE_ADDR will be set to the IP address of the app server, not the proxy front-end; instead you will have to use HTTP_X_FORWARDED_FOR, which can have multiple comma-separated values.

Django Projects and Python Environments

Since we’ve got two app servers, each will have its own Python environment (created with virtualenv) and Django project.  In my setup I decided to serve the Django MEDIA_ROOT from network storage mounted at the same point on each server to avoid synchronization issues.  Otherwise, it seems OK to keep each instance separate (YMMV).  I use Fabric for ensuring that the Python environments and Django projects stay in sync across the two servers.  The precise way you do this syncing depends on your preferences, the available tools, etc.

Apache Configuration

The Apache config on each app server follows the normal Django/WSGI pattern, so I’ll skip the details here.  Note that while it is possible for WSGIScriptAlias path on the app server to differ from the proxied path on the front-end web server (which we’ll get to), this introduces some additional complexities which we will avoid here.  Some issues can be handled on the reverse proxy (front-end) server by Apache directives such as ProxyPassReverse and ProxyPassReverseCookiePath, but you may also need to use Django’s FORCE_SCRIPT_PATH setting in your project settings module.

Front-end Server

At this point you should have working Django projects on each app server under both SSL and non-SSL virtual hosts.  Now we’re going to set up the reverse proxy and load balancing on the front-end server.

Let’s assume your apps are served under the path /webapps on both port 80 and port 443 (SSL) virtual hosts.

Then, you can add to your port 80 virtual host:

<Proxy balancer://django-http>
    BalancerMember http://apps-01.example.com/webapps route=http-1
    BalancerMember http://apps-02.example.com/webapps route=http-2
</Proxy>

<Location /webapps>
    ProxyPass balancer://django-http stickysession=sessionid
    ProxyPassReverse http://apps-01.example.com/webapps
    ProxyPassReverse http://apps-02.example.com/webapps
    ProxyPassReverseCookieDomain apps-01.example.com www.example.com
    ProxyPassReverseCookieDomain apps-02.example.com www.example.com
</Location>

And to your SSL virtual host on port 443:

<Proxy balancer://django-https>
    BalancerMember https://apps-01.example.com/webapps route=https-1
    BalancerMember https://apps-02.example.com/webapps route=https-2
</Proxy>

<Location /webapps>
    ProxyPass balancer://django-https stickysession=sessionid
    ProxyPassReverse https://apps-01.example.com/webapps
    ProxyPassReverse https://apps-02.example.com/webapps
    ProxyPassReverseCookieDomain apps-01.example.com www.example.com
    ProxyPassReverseCookieDomain apps-02.example.com www.example.com
</Location>

This isn’t the only way to do it, of course, and you may have different requirements, but I’ve tried to cover the basics.

1 Comment

Django management command barfs on sites framework

As always, the story is a bit convoluted …

I had recently made changes to the get_absolute_url() methods on a couple of application models (Django 1.3/Python 2.4).  For various reasons, the admin UI for the app is on an intranet site, which is a different host from the public site (represented by a different Django Site object).  The changes involved calling a new function that uses the current site object to determine the appropriate domain for constructing a fully-qualified URL for get_absolute_url().  As originally implemented, I set a variable at the module level in models.py:

BASE_PUBLIC_URL = 'http://%s/path/to/app' % get_public_domain()

where get_public_domain() imports the Site model class from django.contrib.sites, calls Site.objects.get_current() and returns the appropriate public domain.  The get_absolute_url() methods then used BASE_PUBLIC_URL in constructing the final URL.

This worked fine in the normal application contexts, i.e., the admin site and the public site.  However, a custom management command which updates app data from an external source raised an ImportError on the relevant model.  The significant part of the traceback was as follows:

  File "/path/to/pyenv/lib/python2.4/site-packages/django/contrib/sites/models.py", line 25, in get_current
    current_site = self.get(pk=sid)

Ad hoc testing showed the ImportError to be a red herring — and, in any case, it didn’t square with the fact that the rest of the app (minus the admin command) was functional.

Now, the management command doesn’t actually call get_absolute_url(), so I figured that maybe the solution was to wrap the base public URL in a memoized function, so that the current site object is accessed lazily:

@memoize
def get_base_public_url():
    from path.to.mymodule import get_public_domain
    return 'http://%s/path/to/app' % get_public_domain()

That did the trick.  I’m still not sure exactly why the sites framework barfed, and it doesn’t seem worth digging for …

, ,

Leave a comment

Django 1.3 First Impressions

On the whole I am very pleased with Django 1.3.  The developers did a good job of taking care of some of the outstanding warts in Django, particularly the lack of “static file” management.  While I have not yet used the django.contrib.staticfiles app, it appears to solve the problem in a reasonable way.  Now I can retire the workarounds (such as this) that I had developed to deal with the problem.  The addition of built-in logging support is certainly welcome.  Improving and fixing inconsistencies in certain template tags all seems good.  I previously praised the render() shortcut, which will eliminate the repetitive nonsense of render_to_response(context_instance=RequestContext(request)).  The ORM got an important patch allowing configuration of on-delete behavior for ForeignKey (and OneToOneField) fields, about which I had also posted.  One interesting, small, but nice improvement came unannounced: help_text on a form field is now wrapped by a <span></span> when rendered as HTML by the as_*() methods.  I actually filed a ticket reporting this omission from the announcement.  For some reason, it seems that getting this change into the code and documentation has been a challenge.

I suspect there may be some moaning around the deprecation of function-based generic views in favor of class-based views.   Class-based views make sense, but it looks like there will be a little pain in the transition, partly because the keyword arguments for Django’s built-in generic view functions don’t map exactly to the generic view class keyword arguments.  It would have been nice to provide a little smoother transition there.  Also, the class-based view documentation is rather dense because you have to refer to the mixin classes that compose the actual generic view class you want to use.  I’m sure it will get easier with time, but it does feel like a jump in complexity that could make generic views more difficult for new users.    For example, where I did this before in a URLconf module:

from django.conf.urls.defaults import *
from django.views.generic.simple import direct_to_template

urlpatterns = patterns('',
    (r'^status/$', direct_to_template,
     {'template': 'sitetest/status.txt', 'mimetype': 'text/plain'}),
)

I now have to do something like this:

from django.conf.urls.defaults import *
from django.views.generic.base import TemplateView

class PlainTextTemplateView(TemplateView):
    """A plain text generic template view."""
    def render_to_response(self, context, **kwargs):
        return super(PlainTextTemplateView, self).render_to_response(
            context, content_type='text/plain', **kwargs
            )

urlpatterns = patterns('',
    (r'^status/$', PlainTextTemplateView.as_view(template_name='sitetest/status.txt')),
)

While class-based views may have been a step in the right direction for the framework, I wonder how it will play out.

1 Comment

Django template tag to force https URL references

I needed a hack to munge some included HTML content so that <img>, <input>, <link> and <script> tags to that URL references (href and src attributes in those tags) used https. Here’s what I came up with. It’s not bullet-proof, but seems good enough for the need of the moment. Note that <a> hrefs are not altered since I only care about avoiding mixed https/http requests that prompt alarms in some browsers and indicate to users that the page might not be secure.

import re
from django import template

register = template.Library()

HTTP_RE = re.compile(r"""(<(link\s+[^>]*\bhref|(img|input|script)\s+[^>]*\bsrc)\s*=\s*["'])http://""", re.I)

class ForceHttpsNode(template.Node):

    def __init__(self, nodelist):
        self.nodelist = nodelist

    def render(self, context):
        output = self.nodelist.render(context)
        if context.has_key('request') and context['request'].is_secure():
            output = HTTP_RE.sub(r'\1https://', output)
        return output

@register.tag
def forcehttps(parser, token):
    """
    Re-writes ``http://`` URL references in ``<link>``, ``<img>``, ``<input>`` 
    and ``<script>`` tags to ``https://``, if the request is HTTPS.

    Outputs rendered content as-is if request is HTTP.

    Usage example::

        {% forcehttps %}
          <link href="http://example.com/example.css" rel=stylesheet" type="text/css"/>
        {% endforcehttps %}

    If the request is HTTPS, the output should be::

        <link href="https://example.com/example.css" rel=stylesheet" type="text/css"/>

    .. note:: https:// URLs are not checked for validity.

    """
    nodelist = parser.parse(('endforcehttps',))
    parser.delete_first_token()
    return ForceHttpsNode(nodelist)

,

Leave a comment

Django: Hurray for the render() shortcut

I was just thinking yesterday that one of my few annoyances with Django was having to explicitly pass a RequestContext instance to render_to_response() in order have access to the request object in a template. Today I noticed the new render() shortcut that finally stops this violation of the DRY principle. Now they just need to add django.core.context_processors.request to the default list of TEMPLATE_CONTEXT_PROCESSORS. More than once I’ve puzzled over why the request variable in a template wasn’t working. Seriously, isn’t this something folks are going to want more often than not?

,

Leave a comment

Unraveling Python packaging

When I first started developing Python apps — mostly using Django — I took a naive and simple approach to “distribution”: I just used SVN.  This worked fine while I was the only developer and had essentially two projects, an intranet site and a public site.  I built all my apps in one SVN directory, checked out that directory to all my dev, test, and prod servers, and did svn updates as needed.  As tends to happen, things got more complicated over time.  Other folks got involved in Python/Django app development.  As the number of apps increased I became more uncomfortable running all my production code off the trunk.  I wanted to restructure the SVN repo so that each app could be managed more independently. And I started working on another project outside of the scope of my previous work, but for which I wanted to re-use some common code.  This project had a public distribution goal, which prompted me to begin delving into Python packaging techniques.  I went straight for setuptools, since I was familiar as an end-user with easy_install, and it seemed like the leading quick-and-easy solution.  I happily discovered that it was, in fact, easy, and it wasn’t long before I was distributing all my apps internally as RPMs via yum and puppet.  This made my sysadmin very happy.

So, that was cool, but there were a couple of problems.  First, doing this kind of packaging for internal app distribution seemed like rather too much ceremony.  Secondly, I realized that I really should be using virtualenv and pip, and implementing those tools totally changed the way I worked.  Adding Fabric later really pulled things together for me.  I established an internal package index to which I pushed sdist tarballs, and installed all my production apps into virtualenvs using pip -f.  This works well, but still feels like too much overhead for internal dev-to-prod cycles.

Ironically, I find that I am now reconsidering the plain old SVN approach, with a twist.  Since I am now in the habit of tagging versions, I can use some fabfile magic to switch tags and reload httpd, etc.  I think, though, that what I would really like to do is use pip and editable packages installed from SVN.  Unfortunately, there is not yet an option to have pip automatically switch SVN URLs (see http://bitbucket.org/ianb/pip/issue/97/need-a-way-to-install-without-prompts), which blocks my fab mojo.  Now that I have experienced the benefits of packaging and have acquired the discipline of consistent versioning, I don’t want to go back to straight SVN WC’s, although that’s not a bad option, especially if you don’t need dependency management, script installation, or the other goodies you get with setuptools/pip.

For some interesting reading on packaging issues, see James Bennett’s blog post and Ian Bicking’s response.

, , , ,

Leave a comment