org.apache.pig
Interface SamplableLoader

All Superinterfaces:
LoadFunc
All Known Implementing Classes:
BinStorage, PigStorage

public interface SamplableLoader
extends LoadFunc

Implementing this interface indicates to Pig that a given loader can be used by a sampling loader. The requirement for this is that the loader can handle a getNext() call without knowing the position in the file. This will not be the case for loaders that handle structured data such as XML where they must start at the beginning of the file in order to understand their position. Record oriented loaders such as PigStorage can handle this by seeking to the next record delimiter and starting from that point. Another requirement is that the loader be able to skip or seek in its input stream.


Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.pig.LoadFunc
LoadFunc.RequiredField, LoadFunc.RequiredFieldList, LoadFunc.RequiredFieldResponse
 
Method Summary
 long getPosition()
          Get the current position in the stream.
 Tuple getSampledTuple()
          Get the next tuple from the stream starting from the current read position.
 long skip(long n)
          Skip ahead in the input stream.
 
Methods inherited from interface org.apache.pig.LoadFunc
bindTo, bytesToBag, bytesToCharArray, bytesToDouble, bytesToFloat, bytesToInteger, bytesToLong, bytesToMap, bytesToTuple, determineSchema, fieldsToRead, getNext
 

Method Detail

skip

long skip(long n)
          throws IOException
Skip ahead in the input stream.

Parameters:
n - number of bytes to skip
Returns:
number of bytes actually skipped. The return semantics are exactly the same as InpuStream
Throws:
IOException

getPosition

long getPosition()
                 throws IOException
Get the current position in the stream.

Returns:
position in the stream.
Throws:
IOException

getSampledTuple

Tuple getSampledTuple()
                      throws IOException
Get the next tuple from the stream starting from the current read position. The loader implementation should not assume that current read position in the stream is at the beginning of a record since this method is called for sampling and the current read position in the stream could be anywhere in the stream.

Returns:
the next tuple from underlying input stream or null if there are no more tuples to be processed.
Throws:
IOException


Copyright © ${year} The Apache Software Foundation