This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together. For example, if you group by the author field, then all documents with the same value in the author field fall into a single group.

Grouping requires a number of inputs:

The implementation is two-pass: the first pass ({@link org.apache.lucene.search.grouping.FirstPassGroupingCollector}) gathers the top groups, and the second pass ({@link org.apache.lucene.search.grouping.SecondPassGroupingCollector}) gathers documents within those groups. If the search is costly to run you may want to use the {@link org.apache.lucene.search.CachingCollector} class, which caches hits and can (quickly) replay them for the second pass. This way you only run the query once, but you pay a RAM cost to (briefly) hold all hits. Results are returned as a {@link org.apache.lucene.search.grouping.TopGroups} instance.

Known limitations:

Typical usage looks like this (using the {@link org.apache.lucene.search.CachingCollector}):

  FirstPassGroupingCollector c1 = new FirstPassGroupingCollector("author", groupSort, groupOffset+topNGroups);

  boolean cacheScores = true;
  double maxCacheRAMMB = 4.0;
  CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB);
  s.search(new TermQuery(new Term("content", searchTerm)), cachedCollector);

  Collection topGroups = c1.getTopGroups(groupOffset, fillFields);

  if (topGroups == null) {
    // No groups matched
    return;
  }

  boolean getScores = true;
  boolean getMaxScores = true;
  boolean fillFields = true;
  SecondPassGroupingCollector c2 = new SecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset+docsPerGroup, getScores, getMaxScores, fillFields);

  //Optionally compute total group count
  AllGroupsCollector allGroupsCollector = null;
  if (requiredTotalGroupCount) {
    allGroupsCollector = new AllGroupsCollector("author");
    c2 = MultiCollector.wrap(c2, allGroupsCollector);
  }

  if (cachedCollector.isCached()) {
    // Cache fit within maxCacheRAMMB, so we can replay it:
    cachedCollector.replay(c2);
  } else {
    // Cache was too large; must re-execute query:
    s.search(new TermQuery(new Term("content", searchTerm)), c2);
  }

  TopGroups groupsResult = c2.getTopGroups(docOffset);
  if (requiredTotalGroupCount) {
    groupResult = new TopGroups(groupsResult, allGroupsCollector.getGroupCount());
  }

  // Render groupsResult...