This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together. For example, if you group by the author field, then all documents with the same value in the author field fall into a single group.
Grouping requires a number of inputs:
The implementation is two-pass: the first pass ({@link org.apache.lucene.search.grouping.FirstPassGroupingCollector}) gathers the top groups, and the second pass ({@link org.apache.lucene.search.grouping.SecondPassGroupingCollector}) gathers documents within those groups. If the search is costly to run you may want to use the {@link org.apache.lucene.search.CachingCollector} class, which caches hits and can (quickly) replay them for the second pass. This way you only run the query once, but you pay a RAM cost to (briefly) hold all hits. Results are returned as a {@link org.apache.lucene.search.grouping.TopGroups} instance.
Known limitations:
Typical usage looks like this (using the {@link org.apache.lucene.search.CachingCollector}):
FirstPassGroupingCollector c1 = new FirstPassGroupingCollector("author", groupSort, groupOffset+topNGroups); boolean cacheScores = true; double maxCacheRAMMB = 4.0; CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB); s.search(new TermQuery(new Term("content", searchTerm)), cachedCollector); CollectiontopGroups = c1.getTopGroups(groupOffset, fillFields); if (topGroups == null) { // No groups matched return; } boolean getScores = true; boolean getMaxScores = true; boolean fillFields = true; SecondPassGroupingCollector c2 = new SecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset+docsPerGroup, getScores, getMaxScores, fillFields); //Optionally compute total group count AllGroupsCollector allGroupsCollector = null; if (requiredTotalGroupCount) { allGroupsCollector = new AllGroupsCollector("author"); c2 = MultiCollector.wrap(c2, allGroupsCollector); } if (cachedCollector.isCached()) { // Cache fit within maxCacheRAMMB, so we can replay it: cachedCollector.replay(c2); } else { // Cache was too large; must re-execute query: s.search(new TermQuery(new Term("content", searchTerm)), c2); } TopGroups groupsResult = c2.getTopGroups(docOffset); if (requiredTotalGroupCount) { groupResult = new TopGroups(groupsResult, allGroupsCollector.getGroupCount()); } // Render groupsResult...