Archive for category Apache

Django-Apache-WSGI reverse proxy and load balancing

Serving Django apps behind a reverse proxy is really pretty straightforward once you’ve set it up, but you might run into a few snags along the way, depending on your requirements.  Load-balancing only adds a little more complexity.  Here’s how I’ve done it.

Example Architecture

  • Front end web server (www.example.com): Apache 2.2 + mod_proxy, mod_proxy_balancer, mod_ssl.
  • Back end application servers (apps-01.example.com, apps-02.example.com): Apache 2.2 + mod_wsgi, mod_ssl; Python 2.6; Django 1.3.1.
  • Backend database server.
  • Additional requirements: Remote user authentication; SSL and non-SSL proxies.

Let’s start with the application servers and deal with the front end later.

Application Servers

Obviously both app servers will be configured the same way.  How to keep them in sync will be discussed briefly.

Django Settings Module

In order for Django to properly create fully-qualified URLs for the front-end client, you must set:

USE_X_FORWARDED_HOST = True

This setting, new in Django 1.3.1, affects the get_host() and build_absolute_uri() methods of django.http.HttpRequest.  If not set, Django will use the value of the HTTP_HOST or SERVER_NAME variables, which are most likely set to the host name of the app server, not the front end.

If you’re using Django’s RemoteUserMiddleware and RemoteUserBackend for authentication, you will need to replace RemoteUserMiddleware with a custom subclass:

from django.contrib.auth.middleware import RemoteUserMiddleware

class ProxyRemoteUserMiddleware(RemoteUserMiddleware):
    header = 'HTTP_REMOTE_USER'

Then update your settings:

MIDDLEWARE_CLASSES = (
    'path.to.ProxyRemoteUserMiddleware',
    )

(It is possible to avoid this by setting REMOTE_USER on the app web server to the value of HTTP_REMOTE_USER, but here I will assume a default setup.)

If you’re using Django’s “sites” framework, you will probably want to set SITE_ID to correspond to the front-end site.  And if your WSGIScriptAlias path differs from the proxied path on the front-end server (not covered in detail here), you may have to use FORCE_SCRIPT_NAME (check the docs).

Django Application Modules and Templates

If your code or templates contain references to REMOTE_ADDR, REMOTE_USER or other server variables (via HttpRequest.META) affected by proxies, you will probably have to change them.  If you’re using Django’s RemoteUserMiddleware or the ProxyRemoteUserMiddleware subclass shown above, you should probably code with request.user.username instead of request.META['REMOTE_USER']; otherwise, you’ll want to reference HTTP_REMOTE_USER.  REMOTE_ADDR will be set to the IP address of the app server, not the proxy front-end; instead you will have to use HTTP_X_FORWARDED_FOR, which can have multiple comma-separated values.

Django Projects and Python Environments

Since we’ve got two app servers, each will have its own Python environment (created with virtualenv) and Django project.  In my setup I decided to serve the Django MEDIA_ROOT from network storage mounted at the same point on each server to avoid synchronization issues.  Otherwise, it seems OK to keep each instance separate (YMMV).  I use Fabric for ensuring that the Python environments and Django projects stay in sync across the two servers.  The precise way you do this syncing depends on your preferences, the available tools, etc.

Apache Configuration

The Apache config on each app server follows the normal Django/WSGI pattern, so I’ll skip the details here.  Note that while it is possible for WSGIScriptAlias path on the app server to differ from the proxied path on the front-end web server (which we’ll get to), this introduces some additional complexities which we will avoid here.  Some issues can be handled on the reverse proxy (front-end) server by Apache directives such as ProxyPassReverse and ProxyPassReverseCookiePath, but you may also need to use Django’s FORCE_SCRIPT_PATH setting in your project settings module.

Front-end Server

At this point you should have working Django projects on each app server under both SSL and non-SSL virtual hosts.  Now we’re going to set up the reverse proxy and load balancing on the front-end server.

Let’s assume your apps are served under the path /webapps on both port 80 and port 443 (SSL) virtual hosts.

Then, you can add to your port 80 virtual host:

<Proxy balancer://django-http>
    BalancerMember http://apps-01.example.com/webapps route=http-1
    BalancerMember http://apps-02.example.com/webapps route=http-2
</Proxy>

<Location /webapps>
    ProxyPass balancer://django-http stickysession=sessionid
    ProxyPassReverse http://apps-01.example.com/webapps
    ProxyPassReverse http://apps-02.example.com/webapps
    ProxyPassReverseCookieDomain apps-01.example.com www.example.com
    ProxyPassReverseCookieDomain apps-02.example.com www.example.com
</Location>

And to your SSL virtual host on port 443:

<Proxy balancer://django-https>
    BalancerMember https://apps-01.example.com/webapps route=https-1
    BalancerMember https://apps-02.example.com/webapps route=https-2
</Proxy>

<Location /webapps>
    ProxyPass balancer://django-https stickysession=sessionid
    ProxyPassReverse https://apps-01.example.com/webapps
    ProxyPassReverse https://apps-02.example.com/webapps
    ProxyPassReverseCookieDomain apps-01.example.com www.example.com
    ProxyPassReverseCookieDomain apps-02.example.com www.example.com
</Location>

This isn’t the only way to do it, of course, and you may have different requirements, but I’ve tried to cover the basics.

Advertisements

1 Comment

Apache LDAP authentication and Active Directory

I needed to authenticate users in Apache against Active Directory using mod_authnz_ldap.  Normally I would have set the URL and base DN like this:

ldaps://example.com
ou=CompanyPeople,dc=example,dc=com

In this case, however, the users spanned two different top-level containers or “domains”:

ou=CompanyPeople,dc=example,dc=com
ou=OtherPeople,dc=example,dc=com

So, I tried setting the base DN to the top level:

dc=example,dc=com

but authentication failed with this ugly error in the log:

[ldap_search_ext_s() for user failed][Operations error]

It took some hunting, but I finally found that if you want to query the Active Directory “Global Catalog” (GC) via LDAP, you have to use port 3268 or 3269 (LDAPS) instead of the usual default port 389 or 636. So, the working URL and base DN are:

ldaps://example.com:3269
dc=example,dc=com

,

2 Comments

Be explicit when setting Apache host access controls

I recently discovered that I had made an incorrect assumption regarding the use of the host-based authorization directives in Apache: I thought that if, for example, a directory defined Order, Deny, and Allow directives, that the use of an Allow directive in a subdirectory was simply “additive”, i.e., extending the existing rules as if the rules from the parent directory were “inherited” and the extra Allow was added to the list of Allows from the parent.

This is most definitely NOT the case, at least in Apache 2.2, and the documentation does not address this specific issue.  Worse, I think it is reasonable to believe based on the mod_authz_host docs (the module which provides the OrderDeny, and Allow directives) and the “How Sections Are Merged” section of the Apache Configuration Sections doc that in fact the configuration would behave in the way I have expected.  (Of course, one should always test.)

Here’s the problem:

<Directory /abc>
    Order Deny,Allow
    Deny from all
    Allow from example1.com
</Directory>

<Directory /abc/def>
    Allow from example2.com
</Directory>

You might think that access to /abc/def is restricted to hosts from example1.com and example2.com domains, or perhaps just example2.com, but in fact, it’s open to the world!  In other words, the /abc/def block is not equivalent to:

<Directory /abc/def>
    Order Deny,Allow
    Deny from all
    Allow from example1.com
    Allow from example2.com
</Directory>

as I thought it would be, or even to:

<Directory /abc/def>
    Order Deny,Allow
    Deny from all
    Allow from example2.com
</Directory>

The result is the same even if the Order directive in /abc is set to Allow,Deny. It seems as though mod_authz_host resets all its directives whenever one is set in a “directory” context.  The reset state is to allow all by default because neither Allow nor Deny have default values, and the default value of Order is Deny,Allow.

,

Leave a comment

Disable WordPress Flash Uploader with Apache

OK, this doesn’t really disable it, but it does make the “browser uploader” the default …

<Directory /path/to/wp/wp-admin>
  <Files media-new.php>
    # Force browser uploader instead of Flash
    RewriteCond %{QUERY_STRING} !=flash=0
    RewriteRule ^/(.*) /$1?flash=0
  </Files>
</Directory>

(WordPress 2.8.4)

,

1 Comment

Get a fully-qualified URL for the current Django site

NOTE: This post is out of date and I’m not sure it was a good solution to begin with, but I’ll leave it here FWIW. –DCS, 18 Nov 2011

You need to generate a fully-qualified URL to a Django page, in particular outside of a web request context (in which you would have access to server variables), such as an automated process that generates e-mail with links.  You may be able to generate a root-relative URL from a reverse lookup; there’s also get_absolute_url() of course, but it’s provided on a per-model basis, and in any case shouldn’t be coupled with URL elements such as protocol and host name.  You can get the domain part of the host name from the current site object, but Django currently (as of version 1.0.2) provides no means for reliably generating a fully-qualified URL (including protocol and port) outside of a web request context.  In the function current_site_url(), below, I have used two custom settings, MY_SITE_PROTOCOL and MY_SITE_PORT.  (My current practice is to prefix custom settings with MY_,  place them in a parallel module in the project called my_settings.py, and import the custom settings into the project settings module.)

def current_site_url():
    """Returns fully qualified URL (no trailing slash) for the current site."""
    from django.contrib.sites.models import Site
    current_site = Site.objects.get_current()
    protocol = getattr(settings, 'MY_SITE_PROTOCOL', 'http')
    port     = getattr(settings, 'MY_SITE_PORT', '')
    url = '%s://%s' % (protocol, current_site.domain)
    if port:
        url += ':%s' % port
    return url

Now, I still don’t really have enough information to construct a fully-qualified URL for the most general case, because in taking advantage of the django.root setting, my code no longer “knows” what Django’s root path is.  That was good for decoupling the URLconf from the web server conf, but again, I need to generate fully-qualified URLs outside of a web request context, so I don’t have access the django.root setting.  My solution has been to add another custom setting, MY_DJANGO_URL_PATH, which corresponds to the django.root setting (a comment Django’s mod_python handler module indicates that the handler must be called before importing any settings in order for os.environ to be set up correctly with respect to settings).  With that, I can get my Django root URL with this function:

def django_root_url(fq=False):
    """Returns base URL (no trailing slash) for the current project.

    Setting fq parameter to a true value will prepend the base URL
    of the current site to create a fully qualified URL.

    The name django_root_url is used in favor of alternatives
    (such as project_url) because it corresponds to the mod_python
    PythonOption django.root setting used in Apache.
    """
    url = getattr(settings, 'MY_DJANGO_URL_PATH', '')
    if fq:
        url = current_site_url() + url
    return url

With these functions and Django’s reverse URL lookup, I can construct fully-qualified URLs.

, ,

7 Comments

Python script for Apache RewriteMap

It turned out that I didn’t need this after all, but thought I’d post it here anyway …

The use case was to base64-encode a URL so that it could be passed as a query parameter to a login page.  The login routes through a third page and returns to the logn page, which redirects the client back to the original URL which was base64-encoded.  Without the encoding, the third page could mangle the original URL.

#!/usr/bin/python

import binascii
import sys

while sys.stdin:
    print binascii.b2a_base64(sys.stdin.readline().rstrip()) ,
    sys.stdout.flush()

, , ,

1 Comment

Managing static files for Django applications

Update, 1 Apr 2011: The issue of managing static files has been solved in Django 1.3.

Two principles of Django development lead to a dilemma:

  1. Application code should be self-contained — i.e., not coupled with a project.
  2. Django should not serve static media files (for security and efficiency).

So, how does one manage static files (images, css, js, etc.) that are bundled with an application?  I make a couple of assumptions:

  1. You don’t want to hard-code full URL paths in templates, so you need some way to inject a base URL dynamically into your template context.
  2. You want to keep the media files in the application package — that is, not to copy or move them to a filesystem location outside the application directory.

Django’s builtin settings provide for two non-admin media settings, MEDIA_ROOT and MEDIA_URL.  One option for resolving the issue is to use MEDIA_URL and create symlinks from the MEDIA_ROOT directory to the application’s media directory (or directories).  Personally, I don’t like that, partly because I prefer not to use symlinks, but mostly because the MEDIA_ROOT space is used for uploads for model file fields, and it feels like this other static, presentation-related, content should be in its own space.  OTOH the symlink approach is probably the most flexible.

What I’ve been doing to this point is based on the assumption that my application packages all live in the same base directory. I added a custom setting APP_MEDIA_PREFIX (inspired by ADMIN_MEDIA_PREFIX) and set it to the URL path which I alias in in Apache.

Django setting:

APP_MEDIA_PREFIX = '/django/apps/'

Apache conf:

# Application media
AliasMatch ^/django/apps/([^/]+)/media/(.+) /opt/django/apps/$1/media/$2
<DirectoryMatch "^/opt/django/apps/[^/]+/media">
    Allow from all
</DirectoryMatch>

My apps packages are in /opt/django/apps and by convention put their media files in a “media” subdirectory. Then I created a custom template tag for printing APP_MEDIA_PREFIX (inspired by {% admin_media_prefix %}) in my custom template tag module (custom.py):

from django import template
from django.conf import settings

register = template.Library()

@register.simple_tag
def app_media_prefix():
    """Prints value of APP_MEDIA_PREFIX setting.

    Usage: {% app_media_prefix %}
    """
    return getattr(settings, 'APP_MEDIA_PREFIX', '')

Then, in a template, for example:

{% load custom %}
<link rel="stylesheet" type="text/css" href="{% app_media_prefix %}locationguide/media/css/locationguide.css"/>

In this case, the application name/label is “locationguide” and is located in /opt/django/apps/locationguide.

If anyone has thought of a significantly better way to manage this scenario, I’d love to hear it.

, ,

2 Comments