Escape special characters for Solr/Lucene query

I spent all morning on this regexp, so I hope it holds up.

import re

# Solr/Lucene special characters: + - ! ( ) { } [ ] ^ " ~ * ? : \
# There are also operators && and ||, but we're just going to escape
# the individual ampersand and pipe chars.
# Also, we're not going to escape backslashes!
# http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping+Special+Characters
ESCAPE_CHARS_RE = re.compile(r'(?<!\\)(?P<char>[&|+\-!(){}[\]^"~*?:])')

def solr_escape(value):
    r"""Escape un-escaped special characters and return escaped value.

    >>> solr_escape(r'foo+') == r'foo\+'
    True
    >>> solr_escape(r'foo\+') == r'foo\+'
    True
    >>> solr_escape(r'foo\\+') == r'foo\\+'
    True
    """
    return ESCAPE_CHARS_RE.sub(r'\\\g<char>', value)

Note that this is not to be used on a Solr query, but on search values that would be used to construct a full query:

q = 'title:%s' % solr_escape("It's 11:00 -- Do you know where your children are?")

P.S. Thanks to KM!

Advertisements

,

  1. #1 by David Chandek-Stark on March 26, 2010 - 7:52 am

    Thanks to feedback on the solrpy list (http://groups.google.com/group/solrpy/browse_thread/thread/f4437b885ecb0037), the docstring and doctests have been fixed. Note that the docstring itself is a raw string and the strings in the doctests are raw.

  2. #2 by Will Sexton on April 20, 2011 - 11:37 am

    I’m using it. Thanks for the free stuff.

  3. #3 by David Chandek-Stark on April 20, 2011 - 12:13 pm

    Sweet! Glad to hear it’s of use on the home front.

  4. #4 by Silvio on November 27, 2012 - 1:17 pm

    I had to add forward slash to the regexp to make it work on solr 4. See http://lucene.472066.n3.nabble.com/Solr-4-upgrade-guide-tp4001779p4001831.html

    Thanks for the snippet!

  5. #5 by Lighton Phiri on December 20, 2012 - 7:24 pm

    How does one go about escaping index document field names with special characters? I tried escaping it, but I get the following error.

    org.apache.solr.common.SolrException: undefined field dc\-identifier

  6. #6 by http://yahoo.com on February 10, 2013 - 5:52 am

    “Escape special characters for Solr/Lucene query
    Fragments of Code” was in fact a incredibly pleasant article, .
    Continue writing and I will continue to keep reading through!
    Thanks a lot ,Kian