DEV Community

Cover image for Advanced Text Search Mastery with Apache Lucene: A Full Guide
3a5abi 🥷
3a5abi 🥷

Posted on • Updated on • Originally published at devtoys.io

Advanced Text Search Mastery with Apache Lucene: A Full Guide

Apache Lucene is an esteemed search library celebrated for its advanced text search capabilities. It is a vital resource for developers, data analysts, and SEO professionals, providing a robust query syntax for crafting precise and complex search queries. This guide aims to unravel the intricacies of Lucene’s query syntax, enabling you to maximize the potential of Apache Lucene in your projects.

Understanding Lucene Query Syntax: Simple vs. Full

Lucene Query Syntax comes in two flavors: Simple and Full. Both serve to create powerful search queries but differ in terms of complexity and capability.

Simple Lucene Query Syntax

Purpose: Designed for ease of use and quick setup.
Capabilities: Supports basic text searches, including single and multiple term searches, as well as wildcard and fuzzy searches.
Limitations: Lacks the advanced features and precision of Full Lucene
Query Syntax.

Usage Scenario: Ideal for straightforward search requirements where speed and simplicity are prioritized.

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class SimpleLuceneExample {
    public static void main(String[] args) throws Exception {
        // Setup: Create an index
        StandardAnalyzer analyzer = new StandardAnalyzer();
        Directory index = new RAMDirectory();

        // Add documents to the index (omitted for brevity)

        // Simple search example
        String queryStr = "apple";
        Query query = new QueryParser("content", analyzer).parse(queryStr);

        // Search the index
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopDocs results = searcher.search(query, 10);

        // Display search results
        for (ScoreDoc hit : results.scoreDocs) {
            Document doc = searcher.doc(hit.doc);
            System.out.println(doc.get("content"));
        }

        reader.close();
        index.close();
    }
}
Enter fullscreen mode Exit fullscreen mode

Full Lucene Query Syntax

Purpose: Offers an extensive set of features for complex and precise search queries.
Capabilities: Includes all features of Simple Lucene Query Syntax, plus advanced options like Boolean operators, range searches, boosting terms, proximity searches, and field-specific queries.
Usage Scenario: Best suited for complex search requirements that demand a high degree of precision and customization.

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class FullLuceneExample {
    public static void main(String[] args) throws Exception {
        // Setup: Create an index
        StandardAnalyzer analyzer = new StandardAnalyzer();
        Directory index = new RAMDirectory();

        // Add documents to the index (omitted for brevity)

        // Full query example
        String queryStr1 = "apple";
        String queryStr2 = "banana";
        Query query1 = new QueryParser("content", analyzer).parse(queryStr1);
        Query query2 = new QueryParser("content", analyzer).parse(queryStr2);

        BooleanQuery.Builder booleanQuery = new BooleanQuery.Builder();
        booleanQuery.add(query1, BooleanClause.Occur.MUST);
        booleanQuery.add(query2, BooleanClause.Occur.MUST_NOT);

        // Search the index
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopDocs results = searcher.search(booleanQuery.build(), 10);

        // Display search results
        for (ScoreDoc hit : results.scoreDocs) {
            Document doc = searcher.doc(hit.doc);
            System.out.println(doc.get("content"));
        }

        reader.close();
        index.close();
    }
}
Enter fullscreen mode Exit fullscreen mode

Grasping the Basics of Lucene Query Syntax

Single and Multiple Term Searches

Single Term Search

Type the term in the search box. For instance, apple fetches all documents containing “apple”.

String queryStr = "apple";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Multiple Term Search

Inputting terms like apple banana retrieves documents with either “apple”, “banana”, or both.

String queryStr = "apple banana";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Phrase Searches

To find an exact phrase, enclose it in double quotes: "apple pie".

String queryStr = "\"apple pie\"";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Wildcard Searches

Utilize * for multiple character wildcards and ? for single character wildcards: appl*, app?e.

String queryStr = "appl*";
Query query = new QueryParser("content", analyzer).parse(queryStr);

String queryStr = "app?e";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Fuzzy Searches

Add ~ to a term for a fuzzy search: apple~.

String queryStr = "apple~";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Mastering Boolean Operators

Boolean operators enable the creation of complex search queries:
AND: apple AND banana returns documents with both “apple” and “banana”.

String queryStr = "apple AND banana";
Query query = new QueryParser("content", analyzer).parse(queryStr);
OR: apple OR banana retrieves documents with either apple or banana.
String queryStr = "apple OR banana";
Query query = new QueryParser("content", analyzer).parse(queryStr);
NOT: apple NOT banana fetches documents with apple but not banana.
String queryStr = "apple NOT banana";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Implementing Range Searches in Lucene

Range searches are crucial for finding documents with terms within a specific range:

Inclusive Range Searches
Use square brackets []: date:[20230101 TO 20231231], price:[10 TO 50].

String queryStr = "date:[20230101 TO 20231231]";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

Exclusive Range Searches
Use curly brackets {}: price:{10 TO 50}.

String queryStr = "price:{10 TO 50}";
Query query = new QueryParser("content", analyzer).parse(queryStr);
Enter fullscreen mode Exit fullscreen mode

When to Use Exclusive Range Searches

Filtering Results: Exclude edge values.
Avoiding Duplication: Useful in paginated search results.
Precise Numerical Filters: For accurate numerical filtering.

👀 Check out the full tutorial here! ===> Advanced Text Search Mastery with Apache Lucene: A Full Guide - DevToys.io

Top comments (0)