org.openjena.atlas.data
Class DistinctDataBag<E>

java.lang.Object
  extended by org.openjena.atlas.data.AbstractDataBag<E>
      extended by org.openjena.atlas.data.SortedDataBag<E>
          extended by org.openjena.atlas.data.DistinctDataBag<E>
All Implemented Interfaces:
Iterable<E>, DataBag<E>, Closeable, Sink<E>
Direct Known Subclasses:
DistinctDataNet

public class DistinctDataBag<E>
extends SortedDataBag<E>

This data bag will gather distinct items in memory until a size threshold is passed, at which point it will write out all of the items to disk using the supplied serializer.

After adding is finished, call iterator() to set up the data bag for reading back items and iterating over them. The iterator will retrieve only distinct items.

IMPORTANT: You may not add any more items after this call. You may subsequently call iterator() multiple times which will give you a new iterator for each invocation. If you do not consume the entire iterator, you should call Iter.close(Iterator) to close any FileInputStreams associated with the iterator.

Additionally, make sure to call SortedDataBag.close() when you are finished to free any system resources (preferably in a finally block).

Implementation Notes: Data is stored without duplicates as it comes in in a HashSet. When it is time to spill, that data is sorted and written to disk. An iterator that eliminates adjacent duplicates is used in conjunction with the SortedDataBag's iterator.


Constructor Summary
DistinctDataBag(ThresholdPolicy<E> policy, SerializationFactory<E> serializerFactory, Comparator<E> comparator)
           
 
Method Summary
 boolean isDistinct()
          Find out if the bag is distinct.
 boolean isSorted()
          Find out if the bag is sorted.
 Iterator<E> iterator()
          Returns an iterator over a set of elements of type E.
 
Methods inherited from class org.openjena.atlas.data.SortedDataBag
add, close, flush
 
Methods inherited from class org.openjena.atlas.data.AbstractDataBag
addAll, addAll, isEmpty, send, size
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistinctDataBag

public DistinctDataBag(ThresholdPolicy<E> policy,
                       SerializationFactory<E> serializerFactory,
                       Comparator<E> comparator)
Method Detail

isSorted

public boolean isSorted()
Description copied from interface: DataBag
Find out if the bag is sorted.

Specified by:
isSorted in interface DataBag<E>
Overrides:
isSorted in class SortedDataBag<E>
Returns:
true if this is a sorted data bag, false otherwise.

isDistinct

public boolean isDistinct()
Description copied from interface: DataBag
Find out if the bag is distinct.

Specified by:
isDistinct in interface DataBag<E>
Overrides:
isDistinct in class SortedDataBag<E>
Returns:
true if the bag is a distinct bag, false otherwise.

iterator

public Iterator<E> iterator()
Description copied from class: SortedDataBag
Returns an iterator over a set of elements of type E. If you do not exhaust the iterator, you should call Iter.close(Iterator) to be sure any open file handles are closed.

Specified by:
iterator in interface Iterable<E>
Overrides:
iterator in class SortedDataBag<E>
Returns:
an Iterator


Licenced under the Apache License, Version 2.0