Grouping and paving search results using Java's Lucene search tool

Author：Eve Cole Update Time：2025-06-07 20:32:01

Grouping search results using GroupingSearch
Package org.apache.lucene.search.grouping Description

This module can group Lucene's search results, and the specified single-valued fields are gathered together. For example, group according to the "author" field, documents with the same "author" field value are divided into a group.

When grouping, you need to enter some necessary information:

1. groupField: group according to this field. For example, if you use the "author" field to group, then the books in each group are the same author. Documents without this domain will be divided into a separate group.

2. groupSort: group sorting.

3. topNGroups: how many groups are retained. For example, 10 means that only the first 10 groups are retained.

4. groupOffset: Search for which group groups are ranked first. For example, 3 means returning 7 groups (assuming opNGroups equal to 10). It is very useful in pagination, such as only 5 groups are displayed per page.

5. withinGroupSort: Sort documents in groups. Note: The difference between here and groupSort

6. WithgroupOffset: Search for which documents ranked first in each group.

Grouping search results is simpler to use GroupingSearch

GroupingSearch API documentation introduction:

Convenience class to perform grouping in a non distributed environment.

Grouping in non-distributed environments

WARNING: This API is experimental and might change in incompatible ways in the next release.

Version 4.3.1 is used here

Some important ways:

GroupingSearch: setCaching(int maxDocsToCache, boolean cacheScores) cache
GroupingSearch: setCachingInMB(double maxCacheRAMMB, boolean cacheScores) caches the first search results for the second search
GroupingSearch: setGroupDocsLimit(int groupDocsLimit) Specifies the number of documents returned by each group. If not specified, a document will be returned by default.
GroupingSearch: setGroupSort(Sort groupSort) Specify group sorting

Sample code:

1. First look at the index code

 public class IndexHelper { private Document document; private Directory directory; private IndexWriter indexWriter; public Directory getDirectory(){ directory=(directory==null)? new RAMDirectory():directory; return directory; } private IndexWriterConfig getConfig() { return new IndexWriterConfig(Version.LUCENE_43, new IKAnalyzer(true)); } private IndexWriter getIndexWriter() { try { return new IndexWriter(getDirectory(), getConfig()); } catch (IOException e) { e.printStackTrace(); return null; } } public IndexSearcher getIndexSearcher() throws IOException { return new IndexSearcher(DirectoryReader.open(getDirectory())); } /** * Create index for group test * @param author * @param content */ public void createIndexForGroup(int id,String author,String content) { indexWriter = getIndexWriter(); document = new Document(); document.add(new IntField("id", id, Field.Store.YES)); document.add(new StringField("author", author, Field.Store.YES)); document.add(new TextField("content", content, Field.Store.YES)); try { indexWriter.addDocument(document); indexWriter.commit(); indexWriter.close(); } catch (IOException e) { e.printStackTrace(); } }}

2. Grouping:

 public class GroupTestpublic void group(IndexSearcher indexSearcher,String groupField,String content) throws IOException, ParseException { GroupingSearch groupingSearch = new GroupingSearch(groupField); groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE)); groupingSearch.setFillSortFields(true); groupingSearch.setCachingInMB(4.0, true); groupingSearch.setAllGroups(true); //groupingSearch.setAllGroupHeads(true); groupingSearch.setGroupDocsLimit(10); QueryParser parser = new QueryParser(Version.LUCENE_43, "content", new IKAnalyzer(true)); Query query = parser.parse(content); TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, 0, 1000); System.out.println("Search hits:" + result.totalHitCount); System.out.println("Search result grouping: " + result.groups.length); Document document; for (GroupDocs<BytesRef> groupDocs : result.groups) { System.out.println("Group: " + groupDocs.groupValue.utf8ToString()); System.out.println("In-group record: " + groupDocs.totalHits); //System.out.println("groupDocs.scoreDocs.length:" + groupDocs.scoreDocs.length); for (ScoreDoc scoreDoc : groupDocs.scoreDocs) { System.out.println(indexSearcher.doc(scoreDoc.doc)); } } }

3. Simple test:

 public static void main(String[] args) throws IOException, ParseException { IndexHelper indexHelper = new IndexHelper(); indexHelper.createIndexForGroup(1,"Sweet Potato", "Open Source China"); indexHelper.createIndexForGroup(2,"Sweet Potato", "Open Source Community"); indexHelper.createIndexForGroup(3,"Sweet Potato", "Code Design"); indexHelper.createIndexForGroup(4,"Sweet Potato", "Design"); indexHelper.createIndexForGroup(5,"Jiexian", "Lucene development"); indexHelper.createIndexForGroup(6,"Jiexian", "Lucene practical combat"); indexHelper.createIndexForGroup(7,"Jiexian", "Open source Lucene"); indexHelper.createIndexForGroup(8,"Jiexian", "Open source solr"); indexHelper.createIndexForGroup(9,"Sanxian", "Sanxian Open source Lucene"); indexHelper.createIndexForGroup(10,"Sanxian", "Sanxian Open source solr"); indexHelper.createIndexForGroup(11,"Sanxian", "Open source"); GroupTest groupTest = new GroupTest(); groupTest.group(indexHelper.getIndexSearcher(),"author", "open source"); }}

4. Test results:

Two ways of paging
Lucene has two ways of paging:

1. Directly paginate the search results. This method can be used when the data volume is relatively small. The core reference of the paging code is:

 ScoreDoc[] sd = XXX;// Query start record position int begin = pageSize * (currentPage - 1);// Query terminate record position int end = Math.min(begin + pageSize, sd.length);for (int i = begin; i < end && i <totalHits; i++) {//Code for processing search result data}

2. Use searchAfter(...)

Lucene provides five overloading methods that can be used as needed

ScoreDoc after: reduce the total amount of ScoreDoc for the last search result by 1;

Query query: query method

int n: The number of results returned for each query, that is, the total number of results per page

A simple example of using:

 //You can use Map to save the necessary search results Map<String, Object> resultMap = new HashMap<String, Object>();ScoreDoc after = null;Query query = XXTopDocs td = search.searchAfter(after, query, size); //Get hit number resultMap.put("num", td.totalHits); ScoreDoc[] sd = td.scoreDocs; for (ScoreDoc scoreDoc : sd) {//Classic search result processing}//Search results ScoreDocs total amount is reduced by 1 after = sd[td.scoreDocs.length - 1]; //Save after for the next search, that is, the next page starts resultMap.put("after", after); return resultMap;