This website uses cookies

This website uses cookies to give you the best and most relevant experience. By continuing to browse this site, you are agreeing to our use of cookies. Learn More.

Lucene.Net's Core Indexing and Search Classes

Lucene exposes a few classes that abstract a rich set of functionality to provide a fairly straightforward interface for implementing indexing and search operations. Understanding what role each of these classes plays is key to effectively leverage and extend Lucene. Usually about five or six indexing and search classes are involved.

Core Indexing Classes

The following classes the participate in the indexing process:

  1. Field: the building block of a Document. It has a name, a value and a series of options that control how the value is stored and treated during the indexing process. There are a few derived classes that provide specific behaviour e.g. TextField, FloatField, StringField.
  2. Document: a representation of the basic unit of indexing and search, such as an email message or a web page, that needs to be made retrievable for future use. It consists of a collection of Field.
  3. Analyzer: responsible to extract meaningful terms out of the provided text to build the index.
  4. IndexWriter: the central component that can open an existing index or create a new one and add, update or delete Document in it.
  5. Directory: an abstract class that represents the location where the Lucene index is stored.

The following diagram illustrates the roles of these classes in the indexing process.

Classes involved in the indexing process

Core Search Classes

The search api to perform a basic search involves a few classes.

  1. Term: a field-name field-value pair that is the basic unit for searching (similar to Field). Note that Term objects are also created during indexing but they are usually hidden the Lucene internal mechanics.
  2. Query: an abstract class that is used to probe the index to find matching documents. Lucene provides a number of concrete implementations that cater for specific use cases.
  3. QueryParser: generates a query from provided text and using a specified Analyzer. While it is possible to create instances of Query subclasses, QueryParser is often a convenient and handy alternative.
  4. IndexReader: an abstract class that provides access to an index.
  5. IndexSearcher: a lightweight wrapper around an IndexReader that provides search functionality. Opening an IndexReader is a relatively expensive operation, IndexSearcher has less overhead and multiple instances can reuse the same underlying IndexReader.
  6. TopDocs: a list of pointers (document id) to matching documents that are part of the search result. The client application will loop over the TopDocs to load each Document for building the desired output.

The following diagram illustrates the roles of these classes during search.

Classes involved during search

A Simple Indexing and Search Application

The following sections review a simple Lucene.Net movie search application (source code here) to demonstrate the structure of a search application:

  • Creation of a Lucene.Net index from a list of movies on startup
  • Capture of user input in plain text or a query in Lucene syntax
  • Find documents against the specified field that match user input
  • List all the movies in the search results along with their relevancy score

Generating the Movie Index

The movie index is generated by creating a Lucene.Net document for each movie in the list and adding it to the index using IndexWriter's AddDocument method. Each document contains a set of fields with specified title, value and options. There are several constructors available for Field class, the one used in this example specifies options for storing and indexing. Storing options (Store.YES or Store.NO) determine whether the value can be stored for later retrieval during searching.

Two types of fields -StringField and TextField - are used. StringField values are not analyzed and are stored as is. On the other hand, TextField values are analyzed (i.e. broken into separate tokens).

The steps to generate the index are:

  1. Create a Directory
  2. Create an Analyzer - choose one that suits the needs of the application
  3. Create an IndexWriter
  4. Create a Document for each source object (movie) and add appropriate Field information to it
  5. Add the Document to the index
  6. Commit the index
Index generation parts of the MovieIndex class
private const LuceneVersion MATCH_LUCENE_VERSION= LuceneVersion.LUCENE_48;
private readonly IndexWriter writer;
private readonly Analyzer analyzer;
private readonly QueryParser queryParser;
private readonly SearcherManager searchManager;

public MovieIndex(string indexPath)
{            
    analyzer = SetupAnalyzer();
    queryParser = SetupQueryParser(analyzer);
    writer = new IndexWriter(FSDirectory.Open(indexPath), new IndexWriterConfig(MATCH_LUCENE_VERSION, analyzer));
    searchManager = new SearcherManager(writer, true, null);
}

private Analyzer SetupAnalyzer() => new StandardAnalyzer(MATCH_LUCENE_VERSION);

public void Build(IEnumerable<Movie> movies)
{
    if (movies == null) throw new ArgumentNullException();

    foreach (var movie in movies)
        writer.AddDocument(BuildDocument(movie));

    writer.Flush(true, true);
    writer.Commit();
}

private Document BuildDocument(Movie movie)
{
    Document doc = new Document
    {                
        new TextField("title", movie.Title, Field.Store.YES),
        new StringField("year", movie.Year.ToString(), Field.Store.YES),
        new TextField("cast", string.Join(", ", movie.Cast), Field.Store.YES), 
        new TextField("genres", string.Join(", ", movie.Genres), Field.Store.YES)
    };

    return doc;
}

Searching the Index

Searching involves creating a Query and executing it with an IndexSearcher. There are many types of queries which address specific use cases - for example TermQuery, MultiTermQuery, NumericRangeQuery, WildcardQuery etc. Choosing the right query type is important to get desired results. In this demo, a QueryParser is used to generate a query from user's input.

The steps to search the index are:

  1. Open the Directory
  2. Create an IndexSearcher using the Directory (this creates an IndexReader under the hoods)
  3. Create a Query
  4. Invoke IndexSearcher.Search method to get TopDocs as search results
  5. Load matching documents from TopDocs.ScoreDocs and create the desired output
Searching the index
private QueryParser SetupQueryParser(Analyzer analyzer) => new QueryParser(MATCH_LUCENE_VERSION, "title", analyzer);

public SearchResults Search(string queryString)
{
    int resultsPerPage = 100;
    Query query = queryParser.Parse(queryString);
    Console.WriteLine($"{query.ToString()}");
    searchManager.MaybeRefreshBlocking();
    IndexSearcher searcher = searchManager.Acquire();

    try
    {
        TopDocs topdDocs = searcher.Search(query, resultsPerPage);         
        return CompileResults(searcher, topdDocs);
    }
    finally
    {
        searchManager.Release(searcher);
        searcher = null;
    }
}

private SearchResults CompileResults(IndexSearcher searcher, TopDocs topdDocs)
{
    SearchResults searchResults = new SearchResults() { TotalHits = topdDocs.TotalHits };
    foreach (var result in topdDocs.ScoreDocs)
    {
        Document document = searcher.Doc(result.Doc);
        Hit searchResult = new Hit
        {
            Title = document.GetField("title")?.GetStringValue(),
            Year = document.GetField("year")?.GetStringValue(),
            Cast = document.GetField("cast")?.GetStringValue(),
            Score = result.Score,
            Genres = document.GetField("genres")?.GetStringValue()
        };

        searchResults.Hits.Add(searchResult);
    }

    return searchResults;
}

Executing Queries

Search is conducted against the "title" field by default (as specified in the QueryParser constructor). However it is possible to specify a different field at runtime and influence the search behaviour using Lucene query syntax. Running the application can help in understanding certain concepts and how Lucene works.

Term Query

Find the whole term "matrix" in the default search field (title)
search:>matrix
query in lucene syntax => title:matrix 

(1) 1999: The Matrix
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano
GENRE: Science Fiction
score: 9.881662

(2) 2003: The Matrix Reloaded
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Action, Science Fiction
score: 6.176039

(3) 2003: The Matrix Revolutions
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Science Fiction
score: 6.176039

3 results found
Find "matri". This does not return any result as the whole term is not matched.
search:>matri
query in lucene syntax => title:matri 

0 results found
Find "matrix" or "terminator" in the default field
search:>matrix terminator
query in lucene syntax => title:matrix title:terminator 

(1) 1999: The Matrix
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano
GENRE: Science Fiction
score: 3.566091

(2) 1984: The Terminator
CAST: Arnold Schwarzenegger, Linda Hamilton, Michael Biehn, Lance Henriksen, Paul Winfield
GENRE: Science Fiction
score: 3.279447

(3) 2003: The Matrix Reloaded
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Action, Science Fiction
score: 2.228807

(4) 2003: The Matrix Revolutions
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Science Fiction
score: 2.228807

(5) 2009: Terminator Salvation
CAST: Christian Bale, Sam Worthington, Anton Yelchin, Moon Bloodgood, Bryce Dallas Howard, Common, Jadagrace Berry, Helena Bonham Carter, Jane Alexander
GENRE: Science Fiction
score: 2.049654

(6) 2015: Terminator Genisys
CAST: Arnold Schwarzenegger, Emilia Clarke, Jai Courtney
GENRE: Action, Adventure, Science Fiction
score: 2.049654

(7) 1991: Terminator 2: Judgment Day
CAST: Arnold Schwarzenegger, Linda Hamilton, Robert Patrick, Edward Furlong
GENRE: Science Fiction
score: 1.639724

(8) 2003: Terminator 3: Rise of the Machines
CAST: Arnold Schwarzenegger, Nick Stahl, Claire Danes, Kristanna Loken
GENRE: Action, Science Fiction
score: 1.639724

8 results found
Find "matrix" or "terminator" in the default field, expressed differently (syntax field:value), but same result.
search:>title:matrix OR title:terminator
query in lucene syntax => title:matrix title:terminator 

(1) 1999: The Matrix
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano
GENRE: Science Fiction
score: 3.566091

(2) 1984: The Terminator
CAST: Arnold Schwarzenegger, Linda Hamilton, Michael Biehn, Lance Henriksen, Paul Winfield
GENRE: Science Fiction
score: 3.279447

(3) 2003: The Matrix Reloaded
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Action, Science Fiction
score: 2.228807

(4) 2003: The Matrix Revolutions
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Science Fiction
score: 2.228807

(5) 2009: Terminator Salvation
CAST: Christian Bale, Sam Worthington, Anton Yelchin, Moon Bloodgood, Bryce Dallas Howard, Common, Jadagrace Berry, Helena Bonham Carter, Jane Alexander
GENRE: Science Fiction
score: 2.049654

(6) 2015: Terminator Genisys
CAST: Arnold Schwarzenegger, Emilia Clarke, Jai Courtney
GENRE: Action, Adventure, Science Fiction
score: 2.049654

(7) 1991: Terminator 2: Judgment Day
CAST: Arnold Schwarzenegger, Linda Hamilton, Robert Patrick, Edward Furlong
GENRE: Science Fiction
score: 1.639724

(8) 2003: Terminator 3: Rise of the Machines
CAST: Arnold Schwarzenegger, Nick Stahl, Claire Danes, Kristanna Loken
GENRE: Action, Science Fiction
score: 1.639724

8 results found
Find all Kristanna's movies
search:>cast:kristanna
query in lucene syntax => cast:kristanna 

(1) 2006: Lime Salted Love
CAST: Kristanna Loken
GENRE: Drama
score: 6.036573

(2) 2013: Bounty Killer
CAST: Matthew Marsden, Kristanna Loken, Beverly D'Angelo
GENRE: Action, Comedy
score: 3.621944

(3) 2003: Terminator 3: Rise of the Machines
CAST: Arnold Schwarzenegger, Nick Stahl, Claire Danes, Kristanna Loken
GENRE: Action, Science Fiction
score: 3.018287

(4) 2014: Mercenaries
CAST: Brigitte Nielsen, Tim Abell, Cynthia Rothrock, Kristanna Loken
GENRE: Action
score: 3.018287

4 results found
Find all Kristanna's movies in the 2003 (AND is case sensitive)
search:>cast:kristanna AND year:2003
query in lucene syntax => +cast:kristanna +year:2003 

(1) 2003: Terminator 3: Rise of the Machines
CAST: Arnold Schwarzenegger, Nick Stahl, Claire Danes, Kristanna Loken
GENRE: Action, Science Fiction
score: 5.624253

1 results found
Some of the queries above could also be expressed using the TermQuery class
Term term = new Term(field, value);
TermQuery query = new TermQuery(term);

Wildcard Query

Find movie titles that contains the word "matri" (use the star sign to match any number of characters)
search:>matri*
query in lucene syntax => title:matri* 

(1) 1919: A Fugitive from Matrimony
CAST: H. B. Warner, Seena Owen
GENRE: Comedy
score: 1

(2) 1922: Is Matrimony a Failure?
CAST: T. Roy Barnes, Lila Lee, | Comedy
GENRE: Unknown
score: 1

(3) 1923: Modern Matrimony
CAST: Owen Moore, Alice Lake
GENRE: Comedy
score: 1

(4) 1930: The Matrimonial Bed
CAST: Frank Fay, Lilyan Tashman
GENRE: Drama, Comedy
score: 1

(5) 1943: Holy Matrimony
CAST: Monty Woolley, Gracie Fields
GENRE: Comedy
score: 1

(6) 1994: Holy Matrimony
CAST: Patricia Arquette, Joseph Gordon-Levitt, Tate Donovan, Armin Mueller-Stahl
GENRE: Comedy
score: 1

(7) 1999: The Matrix
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano
GENRE: Science Fiction
score: 1

(8) 2003: The Matrix Reloaded
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Action, Science Fiction
score: 1

(9) 2003: The Matrix Revolutions
CAST: Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving
GENRE: Science Fiction
score: 1

9 results found
A question mark can also be used, but it matches a single character
search:>t?ins
query in lucene syntax => title:t?ins 

(1) 1920: Twins of Suffering Creek
CAST: William Russell, Louise Lovely
GENRE: Drama
score: 1

(2) 1975: Rafferty and the Gold Dust Twins
CAST: Sally Kellerman, Alan Arkin, Mackenzie Phillips
GENRE: Comedy
score: 1

(3) 1988: Twins
CAST: Arnold Schwarzenegger, Danny DeVito, Chloe Webb, Kelly Preston
GENRE: Comedy
score: 1

(4) 2014: The Skeleton Twins
CAST: Kristen Wiig, Bill Hader, Luke Wilson, Ty Burrell
GENRE: Comedy, Drama
score: 1

4 results found

Note that "*" or "?" cannot be used as the first character of the search term.

Wildcard queries can also be expressed using the WildcardQuery class.

Using WildcardQuery class
Term term = new Term(field, value);
Query query = new WildcardQuery(term);

MAY Contain, MUST Contain, MUST NOT Contain

Any query prefixed with the plus sign (+) requires that term to exist in the specified field. Any query prefixed with the minus sign (-) excludes documents that contain the search term in the specified field.

Find all movies, MUST HAVE ehle in the cast and MUST HAVE "thirty" in the title
search:>+cast:ehle +title:thirty
query in lucene syntax => +cast:ehle +title:thirty 

(1) 2012: Zero Dark Thirty
CAST: Jessica Chastain, Jason Clarke, Joel Edgerton, Mark Strong, Chris Pratt, Kyle Chandler, Taylor Kinney, Mark Duplass, Frank Grillo, Stephen Dillane, Édgar Ramírez, Harold Perrineau, Reda Kateb, Jennifer Ehle, James Gandolfini, Scott Adkins, Mark Valley, Ricky Sekhon, John Barrowman
GENRE: Action, Thriller
score: 4.322779

1 results found
Find all movies, MUST HAVE ehle in the cast and MAY HAVE "thirty" in the title
search:>+cast:ehle title:thirty
query in lucene syntax => +cast:ehle title:thirty 

(1) 2012: Zero Dark Thirty
CAST: Jessica Chastain, Jason Clarke, Joel Edgerton, Mark Strong, Chris Pratt, Kyle Chandler, Taylor Kinney, Mark Duplass, Frank Grillo, Stephen Dillane, Édgar Ramírez, Harold Perrineau, Reda Kateb, Jennifer Ehle, James Gandolfini, Scott Adkins, Mark Valley, Ricky Sekhon, John Barrowman
GENRE: Action, Thriller
score: 4.322779

(2) 2002: Possession
CAST: Aaron Eckhart, Gwyneth Paltrow, Jeremy Northam, Jennifer Ehle
GENRE: Drama
score: 1.085872

(3) 2018: The Miseducation of Cameron Post
CAST: Chloë Grace Moretz, Sasha Lane, John Gallagher, Jr., Forrest Goodluck, Jennifer Ehle
GENRE: Drama
score: 0.8686977

(4) 2011: Contagion
CAST: Marion Cotillard, Matt Damon, Laurence Fishburne, Jude Law, Gwyneth Paltrow, Kate Winslet, Bryan Cranston, Jennifer Ehle, Sanaa Lathan, Amr Waked, John Hawkes, Demetri Martin
GENRE: Action, Thriller
score: 0.6515233

4 results found
Find all movies, MUST HAVE ehle in the cast and MUST NOT HAVE "thirty" in the title
search:>+cast:ehle -title:thirty
query in lucene syntax => +cast:ehle -title:thirty 

(1) 2002: Possession
CAST: Aaron Eckhart, Gwyneth Paltrow, Jeremy Northam, Jennifer Ehle
GENRE: Drama
score: 3.018287

(2) 2018: The Miseducation of Cameron Post
CAST: Chloë Grace Moretz, Sasha Lane, John Gallagher, Jr., Forrest Goodluck, Jennifer Ehle
GENRE: Drama
score: 2.414629

(3) 2011: Contagion
CAST: Marion Cotillard, Matt Damon, Laurence Fishburne, Jude Law, Gwyneth Paltrow, Kate Winslet, Bryan Cranston, Jennifer Ehle, Sanaa Lathan, Amr Waked, John Hawkes, Demetri Martin
GENRE: Action, Thriller
score: 1.810972

3 results found

Range Query

Find movies between 2010 and 2018
search:>year:[2010 TO 2018]
2043 results found

actual list of movies too long to be listed here

Fuzzy Query

Lucene allows to perform fuzzy searches. This is a very powerful approximation technique that finds results which can be relevant to the search term even though they do not exactly correspond to it. For example, goat and coat. One use case is the "did you mean"" feature employed by search engines. For example, if a user incorrectly writes "Torontor", search engines like Google show "Did you mean: Toronto" along with the results.

Find movies where the title somewhat looks like "terminador" (use the tilde sign "~")
search:>terminador~
query in lucene syntax => title:terminador~2 

(1) 1984: The Terminator
CAST: Arnold Schwarzenegger, Linda Hamilton, Michael Biehn, Lance Henriksen, Paul Winfield
GENRE: Science Fiction
score: 9.476197

(2) 2009: Terminator Salvation
CAST: Christian Bale, Sam Worthington, Anton Yelchin, Moon Bloodgood, Bryce Dallas Howard, Common, Jadagrace Berry, Helena Bonham Carter, Jane Alexander
GENRE: Science Fiction
score: 5.922623

(3) 2015: Terminator Genisys
CAST: Arnold Schwarzenegger, Emilia Clarke, Jai Courtney
GENRE: Action, Adventure, Science Fiction
score: 5.922623

(4) 1991: Terminator 2: Judgment Day
CAST: Arnold Schwarzenegger, Linda Hamilton, Robert Patrick, Edward Furlong
GENRE: Science Fiction
score: 4.738099

(5) 2003: Terminator 3: Rise of the Machines
CAST: Arnold Schwarzenegger, Nick Stahl, Claire Danes, Kristanna Loken
GENRE: Action, Science Fiction
score: 4.738099

5 results found

Proximity Query

The tilde sign can also be used for proximity searches. For example, find movie titles with the word wolf and street within 5 words of each other.

Movie titles with the word wolf and street within 5 words of each other
search:>"wolf street"~5
query in lucene syntax => title:"wolf street"~5 

(1) 1929: The Wolf of Wall Street
CAST: George Bancroft, Olga Baclanova, Nancy Carroll
GENRE: Drama
score: 3.969877

(2) 2013: The Wolf of Wall Street
CAST: Leonardo DiCaprio, Jonah Hill, Margot Robbie, Matthew McConaughey, Kyle Chandler, Rob Reiner, Jon Favreau, Jean Dujardin
GENRE: Comedy
score: 3.969877

2 results found