Need Quality Code? Get Silver Backed

Spatial Search in Lucene.Net - Worked Example

1stApr

0

by Gary H

We've received a lot of contact over the Spatial Search in Lucene.Net article that we published earlier this year. In this article we revisit the Spatial framework and use a worked example to explore its usage.

Getting started

We start by creating a console application for our example. Give it a reasonable name (I chose LuceneSpatialExample), ensure you are targetting the .Net Framework 4.0 and create it. The next step is to reference the Lucene packages we care about. We're going to use the same versions of Spatial as we did in the previous article (2.9.4.1) so using the Package Manager run:

Install-Package Lucene.Net -Version 2.9.4.1
Install-Package Lucene.Net.Contrib -Version 2.9.4.1

Modelling Locations

Spatial search requires locations to search against. In the interests of making this example as useful as possible, we're going to model our locations in a manner that may be seen in the wild in existing systems. We start by creating a Location class:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LuceneSpatialExample
{
    public class Location
    {
        public int? Id { get; set; }
        public string Name { get; set; }
        public Coordinate Coordinates { get; set; }

        public Location()
        {
            
        }

        public Location(int id, string name, Coordinate cord)
        {
            Id = id;
            Name = name;
            Coordinates = cord;
        }
    }
}

Note that theLocation has an Id, a name and a Coordinate. The coordinate models latitude and longitude and is the next class we create:

namespace LuceneSpatialExample
{
    public class Coordinate
    {
        public double Latitude { get; set; }
        public double Longitude { get; set; }

        public Coordinate()
        {
            
        }

        public Coordinate(double latitude, double longitude)
        {
            Latitude = latitude;
            Longitude = longitude;
        }
    }
}

Finally we want a nice data access mechanism so we'll use a very simple repository for our example and pre-populate it with some locations:

using System.Collections.Generic;
using System.Linq;

namespace LuceneSpatialExample
{
    public static class LocationRepository
    {
        private static readonly List<Location> Locations = 
          new List<Location>(new[]
            {
                new Location(1, "Buffalo", 
                            new Coordinate(45.17614, -93.87341)),
                new Location(2, "New York", 
                            new Coordinate(40.7143, -74.006)),
                new Location(3, "San Francisco", 
                            new Coordinate(37.7752, -122.4232)),
            }); 

        public static IEnumerable<Location> All()
        {
            return Locations;
        }

        public static Location Get(int id)
        {
            return Locations.FirstOrDefault(l => l.Id == id);
        }

        public static IEnumerable<Location> GetMany(params int[] ids)
        {
            return Locations.Where(l => l.Id.HasValue && 
                                  ids.Contains(l.Id.Value)).ToList();
        }
    }
}

Building a Search Engine

We know what our locations look like and we have modelled our lat/long coordinates so the next step is to build our Search Engine. This is the meat of our work in the example so we'll split it up. To begin with we will create all the gruntwork shared properties that we need to perform a search. We create a SearchEngine class and then within it we create structs to strongly name the fields that we want to index in our search documents.

namespace LuceneSpatialExample
{
    public class SearchEngine
    {
        public struct Fields
        {
            public const string Latitude = "latitude";
            public const string Longitude = "longitude";
            public const string HasGeoCode = "hasGeoCode";
            public const string Name = "name";
            public const string Id = "id";
            public const string LocationTierPrefix = "LocTierPrefix_";
        }

        public struct FieldFlags
        {
            public const string HasField = "true";
            public const string DoesNotHasField = "false";
        }        
    }
}

This will make it easier for us to refer to fields consistently whenever we interact with them. Next up is to add the constants, fields and properties for spatial search as outlined in the previous article.

public const double KmsToMiles = 0.621371192;
private const double MaxKms = 5000 * KmsToMiles;
private const double MinKms = 1 * KmsToMiles;
private static int _startTier;
private static int _endTier;
private static Dictionary<int, CartesianTierPlotter> Plotters 
                                                  { get; set; }
private const int MaxResults = 50;

We also add in some instrumentation so we can tell that the engine has initialised and to get basic timing data out of it. We will also create a single shared IndexSearcher property, a Directory for it to interact with and a utility property for tracking the version of Lucene we are working against.

private static IndexSearcher _searcher;

private static readonly object SyncRoot = new object();

private static bool Initialised { get; set; }

public static long TimeTakenToInitialiseMs { get; set; }

public static int TotalIndexedDocuments
{
    get { return !Initialised ? 0 : Searcher.MaxDoc(); }
}

protected static Directory IndexDirectory { get; private set; }

protected static IndexSearcher Searcher
{
    get
    {
        if (_searcher == null)
        {
            lock (SyncRoot)
            {
                Thread.MemoryBarrier();
                if (_searcher == null)
                {
                    _searcher = new IndexSearcher(IndexDirectory, true);
                }
            }
        }

        return _searcher;
    }
}

public static Version Version
{
    get { return Version.LUCENE_29; }
}

With the basic data structures prepared we can move to using the engine itself. We add an Initialise method which will retrieve all of the locations in our repository and index them as well as preparing all of the plotters we use for spatial search.

public static void Initialise()
{
    Initialised = false;
    lock (SyncRoot)
    {
        if (IndexDirectory != null)
        {
            IndexDirectory.Dispose();
        }

        Thread.MemoryBarrier();
        IndexDirectory = new RAMDirectory();
        TimeTakenToInitialiseMs = 0;

        var sw = new Stopwatch();
        sw.Start();

        IProjector projector = new SinusoidalProjector();
        var ctp = new CartesianTierPlotter(0, projector,
                                           Fields.LocationTierPrefix);
        _startTier = ctp.BestFit(MaxKms);
        _endTier = ctp.BestFit(MinKms);

        Plotters = new Dictionary<int, CartesianTierPlotter>();
        for (var tier = _startTier; tier <= _endTier; tier++)
        {
            Plotters.Add(tier, new CartesianTierPlotter(tier, 
                            projector, Fields.LocationTierPrefix));
        }
        
        IndexMany(LocationRepository.All());
        sw.Stop();
        TimeTakenToInitialiseMs = sw.ElapsedMilliseconds;
        Initialised = true;
    }
}

The key method call in the initialisation is to IndexMany. This takes a collection of Location objects and indexes each of them into our Directory. This is the next method we will implement. It is composed of a few discrete steps, first we create an Analyzer. We then create an index writer. We iterate over the locations indexing each one, the indexing itself stores the ID of the location, whether the location has any GeoCode data and if so it stores the Lat/Long and then also plots it onto our cartesian tiers. In total this looks like:

public static void IndexMany(IEnumerable<Location> locations)
{
    if (locations == null)
    {
        return;
    }

        // Ignore indexing if there are no locations to index
    var locsToIndex = locations as List<Location> ?? 
                                              locations.ToList();
    if (!locsToIndex.Any())
    {
        return;
    }

        // Ignore indexes when initialisation is not under way
    if (IndexDirectory == null)
    {
        return;
    }

    var analyzer = CreateAnalyzer();
    using (var writer = new IndexWriter(IndexDirectory, analyzer, 
                                IndexWriter.MaxFieldLength.UNLIMITED))
    {
        foreach (var p in locsToIndex)
        {
            Index(p, writer);
        }

        writer.Commit();
        analyzer.Close();
        writer.Dispose();
    }

    ReloadIndexesIntoSearcher();
}


public static Analyzer CreateAnalyzer()
{
    return new StandardAnalyzer(Version);
}

protected static void Index(Location location, IndexWriter writer)
{
    if (location == null || location.Id == null)
    {
        return;
    }

        // Step 1 - Purge the location from Lucene if it already exists
    var query = NumericRangeQuery.NewIntRange(Fields.Id, location.Id, 
                                            location.Id, true, true);
    writer.DeleteDocuments(query);

        // Step 2 - Index the fields we want to search on
    var doc = new Document();
    var nm = new NumericField(Fields.Id, Field.Store.YES, true);
    nm.SetIntValue(location.Id.Value);
    doc.Add(nm);

    doc.Add(new Field(Fields.Name, location.Name, Field.Store.NO, 
                                              Field.Index.ANALYZED));
    if (location.Coordinates != null)
    {
        doc.Add(new Field(Fields.HasGeoCode, FieldFlags.HasField, 
                            Field.Store.NO, Field.Index.NOT_ANALYZED));
        doc.Add(new Field(Fields.Latitude, NumericUtils
                  .DoubleToPrefixCoded(location.Coordinates.Latitude), 
                  Field.Store.YES, Field.Index.NOT_ANALYZED));
         
        doc.Add(new Field(Fields.Longitude, NumericUtils
                  .DoubleToPrefixCoded(location.Coordinates.Longitude), 
                  Field.Store.YES, Field.Index.NOT_ANALYZED));
                  
        AddCartesianTiers(location.Coordinates, doc);
    }
    
        // Step 3 - Add the location document into the search index
    writer.AddDocument(doc);
}

///<summary> 
/// Reloads the indexes into the searcher. By default the searcher takes 
/// a snapshot of indexes at the time of its creation, we need to reload 
/// those indexes into the searcher after new documents are added or old 
/// ones deleted in order for them to appear in future search results. 
/// </summary>
protected static void ReloadIndexesIntoSearcher() 
{ 
  lock (SyncRoot) 
  { 
    if (_searcher != null) 
    { 
      _searcher.Dispose(); 
    } 
    
    Thread.MemoryBarrier(); 
    _searcher = new IndexSearcher(IndexDirectory, true); 
  } 
} 

private static void AddCartesianTiers(Coordinate cord, Document document) 
{ 
  for (var tier = _startTier; tier <= _endTier; tier++) 
  { 
    var ctp = Plotters[tier]; 
    var boxId = ctp.GetTierBoxId(cord.Latitude, cord.Longitude); 
    document.Add(
      new Field(
        ctp.GetTierFieldName(), 
        NumericUtils.DoubleToPrefixCoded(boxId), 
        Field.Store.YES, 
        Field.Index.NOT_ANALYZED_NO_NORMS)); 
  } 
}

Finally we need a method to search our index and return results. I like to have a little control over how the results are returned so I will create a search results class which contains the information about a result that I consider important.

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LuceneSpatialExample
{
    public class SearchResult
    {
        public double Score { get; set; }
        public int LocationId { get; set; }
        public double DistanceInKms { get; set; }

        public SearchResult()
        {
            
        }

        public SearchResult(double score, int id, double distanceInKms)
        {
            Score = score;
            LocationId = id;
            DistanceInKms = distanceInKms;
        }
    }
}

 

Next we create the search method in our search engine which will query our index and return some meaningful results. We apply the same techniques as in the previous article. One refinement is that we access the LatLongDistanceFilter at result time to get the actual distance that the result is from the starting location.

 

public static IEnumerable<SearchResult> Search(
                      Coordinate centrePoint, double searchRadiusInKms)
{
    if (centrePoint == null)
    {
        throw new ArgumentNullException("centrePoint");
    }

    if (!Initialised)
    {
        lock (SyncRoot)
        {
            Thread.MemoryBarrier();
            if (!Initialised)
            {
                Initialise();
            }
        }
    }

    var builder = new CartesianPolyFilterBuilder(
                                          Fields.LocationTierPrefix);
                                          
    var boundingArea = builder.GetBoundingArea( centrePoint.Latitude, 
                centrePoint.Longitude, searchRadiusInKms * KmsToMiles);
                
    var distFilter = new LatLongDistanceFilter(boundingArea,
                                        searchRadiusInKms * KmsToMiles,
                                        centrePoint.Latitude,
                                        centrePoint.Longitude,
                                        Fields.Latitude,
                                        Fields.Longitude);

    var masterQuery = new BooleanQuery();

    masterQuery.Add(new TermQuery(new Term(Fields.HasGeoCode, 
                    FieldFlags.HasField)), BooleanClause.Occur.MUST);
    
    masterQuery.Add(new ConstantScoreQuery(distFilter), 
                                          BooleanClause.Occur.MUST);

    var results = Searcher.Search(masterQuery, null, MaxResults);
    return results.ScoreDocs.Select(sd => new SearchResult(sd.score, 
         int.Parse(Searcher.Doc(sd.doc).GetField(Fields.Id)
            .StringValue()),
         distFilter.GetDistance(sd.doc) / KmsToMiles)) // Convert to KMs
         .ToList();
}

 

Putting it all Together

 

With our search engine built we're ready to search! We will update the Main method in our console application to perform some simple searches and dump results to the screen. These could easily be applied as unit tests but in the interests of getting some answers into your hands with the minimum of dependencies we're going to use good old fashioned Console.WriteLine.

 

using System;
using System.Collections.Generic;
using System.Linq;

namespace LuceneSpatialExample
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Started - Initialising.");
            SearchEngine.Initialise();
            Console.WriteLine("Initialised, took {0}ms to index {1} 
                      documents", SearchEngine.TimeTakenToInitialiseMs, 
                      SearchEngine.TotalIndexedDocuments);
            Console.WriteLine();
            
            Console.WriteLine("Searching for all areas within 4KM of 
                                  45.15,-93.85 (Expect 1 - Buffalo)");
            DumpResults(
                SearchEngine.Search(new Coordinate(45.15, -93.85), 4));

            Console.WriteLine("Searching for all areas within 2KM of 
                                              45.15,-93.85 (Expect 0)");
            DumpResults(
                SearchEngine.Search(new Coordinate(45.15, -93.85), 2));
            
            Console.WriteLine("Searching for all areas within 2000KM of 
                    45.15,-93.85 (Expect 2 - Buffalo ? New York)");
            DumpResults(
              SearchEngine.Search(new Coordinate(45.15, -93.85), 2000));
            
            Console.WriteLine("Searching for all areas within 4000KM of 
              45.15,-93.85 (Expect 3 - Buffalo, NY & San Fran)");
            DumpResults(
              SearchEngine.Search(new Coordinate(45.15, -93.85), 4000));
            
            Console.WriteLine();
            Console.WriteLine("Finished - Press any key to exit");
            Console.ReadKey();

        }

        private static void DumpResults(IEnumerable<SearchResult> 
                                                            results)
        {
            Console.WriteLine();
            Console.WriteLine("-----------------------------------");
            var searchResults = results as IList<SearchResult> 
                                    ?? results.ToList();
            Console.WriteLine("Found {0} results", 
                                searchResults.Count());
            foreach (var res in searchResults)
            {
                var loc = LocationRepository.Get(res.LocationId);
                Console.WriteLine("  -> {0} ({1}) was {2:F2}kms from 
                  search centre", loc.Name, loc.Id, res.DistanceInKms);
            }
            Console.WriteLine();
        }
    }
}

 

Conclusion

 

Hopefully this has helped you understand how Lucene Spatial Search can be applied at a reasonable level with very little effort. We've covered all of the basics, from modelling a location through indexing and searching so this information is equally applicable to writing any search infrastructure in Lucene.

 

Full source for this example can be pulled from: https://leapinggorilla.kilnhg.com/Code/Open-Source/. Click the down arrow next to the Lucene-SpatialExample repository and select "Archive".

 

Alternatively you can access a ZIP archive of the code from: http://media.leapinggorilla.com/Files/Lucene-SpatialExample.zip

C# , Lucene , Search

Comments are Locked for this Post