Need Quality Code? Get Silver Backed

Spatial Search in Lucene.Net

28thJan

0

by Gary H

Edit: 15th April 2013

A worked example for spatial search has been added to our site.


The core Lucene engine provides us with many powerful tools for searching text and numbers with the ability to expand this over dates. Geographical or spatial search however is a different beast and does not easily fit into Lucene's structure. To extend Lucene to cope with this we need to look outside of the core Lucene engine and turn to the contrib libraries - extensions provided by other developers that extend the main Lucene experience.

To start with spatial search we need to add the contrib libraries to our project. The simplest way to do this is to use NuGet to pull the Lucene.Net.Contrib package. However the latest build as of this post (3.0.3) does not support spatial search. Instead you must use the prior version (2.9.4.1) which also requires you to remain on the same version of Lucene.Net. Start by pulling down these packages:

Install-Package Lucene.Net -Version 2.9.4.1
Install-Package Lucene.Net.Contrib -Version 2.9.4.1

Indexing

With the packages added we can start adding the infrastructure we need for spatial search. Begin by adding a field each for your latitude and longitude data. At this point you may also want to add another optional field that will flag whether the document has any geolocation data at all. This will allow you to pre-filter on that if not all of your documents will contain a latitude and longitude.

doc.Add(new Field("Latitude", 
				  NumericUtils.DoubleToPrefixCoded(Latitude), 
				  Field.Store.YES, Field.Index.NOT_ANALYZED));
				  
doc.Add(new Field("Longitude", 
				  NumericUtils.DoubleToPrefixCoded(Longitude), 
				  Field.Store.YES, Field.Index.NOT_ANALYZED));

Next stage is to start tapping into the Spatial contrib framework. The idea behind the framework is that you take a projection of the globe (like unrolling it into a flat map) and divide it into a large grid. Each grid square is further subdivided into another grid which is itself divided and so on as you move down a series of tiers of grids. This gives us a great way to search, we start with a very coarse overhead view and gradually refine down further and further to the individual grid square that contains your document. An overview (with pictures!) can be found at the Java Community blog that explains this in more detail.

Because our spatial search makes use of these tiers we need to both initialise them and then use them as we plot our documents into the grid. We begin with initialisation, I choose to maintain a Dictionary of plotters keyed by tier and refer to them as and when necessary. This saves the overhead of newing up plotters in the loop as we add our documents. It is also important to note that the Spatial framework deals internally exclusively in Miles whereas I need to work in Kilometres. This is taken account of in the configuration:

public const double KmsToMiles = 0.621371192;
private const double MaxKms = 250 * KmsToMiles;
private const double MinKms = 1 * KmsToMiles;
private static int StartTier;
private static int EndTier;

private static Dictionary<int, CartesianTierPlotter> Plotters { get; set; }

On first access I initialise my plotters. In this sample we refer to Fields.LocationTierPrefix, this is the string prefix that will automatically be applied by all plotters across all indexed fields at every level. In the example this is a const exposed via a struct set to "LocationTierPrefix_".

IProjector projector = new SinusoidalProjector();
var ctp = new CartesianTierPlotter(0, projector, 
								   Fields.LocationTierPrefix);
StartTier = ctp.BestFit(MaxKms);
EndTier = ctp.BestFit(MinKms);

Plotters = new Dictionary<int, CartesianTierPlotter>();
for (var tier = StartTier; tier <= EndTier; tier++)
{
	Plotters.Add(tier, new CartesianTierPlotter(tier, 
											projector, 
											Fields.LocationTierPrefix));
}

With our plotters prepared we can now start indexing our geolocation data. We need to make sure we plot our document onto every grid where it appears through the tiers. To do this we need to walk the entire tier stack and add our document to each level. We bundle this into a function to make it easier to handle:

private static void AddCartesianTiers(double latitude, 
									  double longitude, 
									  Document document)
{
	for (var tier = StartTier; tier <= EndTier; tier++)
	{
		var ctp = Plotters[tier];
		var boxId = ctp.GetTierBoxId(latitude, longitude);
		document.Add(new Field(ctp.GetTierFieldName(),
						NumericUtils.DoubleToPrefixCoded(boxId),
						Field.Store.YES,
						Field.Index.NOT_ANALYZED_NO_NORMS));
	}
}

And the final step to indexing our document is to add the tiers to our document:

AddCartesianTiers(Latitude, Longitude, doc);

Searching

With the geolocation data indexed we now move onto searching our documents. To do this we create a Filter that understands the tiers we are using, further filter those coarse results with a precise distance filter and then pass the whole lot through a constant score query for use in our searches. This gives us the following code for Query creation:

/* 	Builder allows us to build a polygon which we will use to limit  
 * search scope on our cartesian tiers, this is like putting a grid 
 * over a map */
var builder = new CartesianPolyFilterBuilder(Fields.LocationTierPrefix);

/* 	Bounding area draws the polygon, this can be thought of as working  
 * out which squares of the grid over a map to search */
var boundingArea = builder.GetBoundingArea(Latitude, 
				Longitude, 
				DistanceInKilometres * ProductSearchEngine.KmsToMiles);

/*	We refine, this is the equivalent of drawing a circle on the map,  
 *	within our grid squares, ignoring the parts the squares we are  
 *  searching that aren't within the circle - ignoring extraneous corners 
 *	and such */
var distFilter = new LatLongDistanceFilter(boundingArea, 
									DistanceInKilometres * KmsToMiles,
		                            Latitude, 
									Longitude, 
									ProductSearchEngine.Fields.Latitude,
									ProductSearchEngine.Fields.Longitude);

/* 	We add a query stating we will only search against products that have 
 * GeoCode information */
var query = new TermQuery(new Term(Fields.HasGeoCode, 
								   FieldFlags.HasField));
								   
/* 	Add our filter, this will stream through our results and 
 * determine eligibility */
masterQuery.Add(new ConstantScoreQuery(distanceFilter), 
				BooleanClause.Occur.MUST);

We are now free to use the query however we see fit.

C# , Lucene , Search

Comments are Locked for this Post