|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.impl.builtin.SampleLoader
public abstract class SampleLoader
Abstract class that specifies the interface for sample loaders
Field Summary | |
---|---|
protected SamplableLoader |
loader
|
protected int |
numSamples
|
protected long |
skipInterval
|
Constructor Summary | |
---|---|
SampleLoader(String funcSpec)
|
Method Summary | |
---|---|
void |
bindTo(String fileName,
BufferedPositionedInputStream is,
long offset,
long end)
Specifies a portion of an InputStream to read tuples. |
DataBag |
bytesToBag(byte[] b)
Cast data from bytes to bag value. |
String |
bytesToCharArray(byte[] b)
Cast data from bytes to chararray value. |
Double |
bytesToDouble(byte[] b)
Cast data from bytes to double value. |
Float |
bytesToFloat(byte[] b)
Cast data from bytes to float value. |
Integer |
bytesToInteger(byte[] b)
Cast data from bytes to integer value. |
Long |
bytesToLong(byte[] b)
Cast data from bytes to long value. |
Map<String,Object> |
bytesToMap(byte[] b)
Cast data from bytes to map value. |
Tuple |
bytesToTuple(byte[] b)
Cast data from bytes to tuple value. |
void |
computeSamples(ArrayList<Pair<FileSpec,Boolean>> inputs,
PigContext pc)
|
Schema |
determineSchema(String fileName,
ExecType execType,
DataStorage storage)
Find the schema from the loader. |
void |
fieldsToRead(Schema schema)
Indicate to the loader fields that will be needed. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
int |
getNumSamples()
|
void |
setNumSamples(int n)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int numSamples
protected long skipInterval
protected SamplableLoader loader
Constructor Detail |
---|
public SampleLoader(String funcSpec)
Method Detail |
---|
public void setNumSamples(int n)
public int getNumSamples()
public void bindTo(String fileName, BufferedPositionedInputStream is, long offset, long end) throws IOException
LoadFunc
A common way of handling slices in the middle of records is to start at the given offset and, if the offset is not zero, skip to the end of the first record (which may be a partial record) before reading tuples. Reading continues until a tuple has been read that ends at an offset past the ending offset.
The load function should not do any buffering on the input stream. Buffering will cause the offsets returned by is.getPos() to be unreliable.
bindTo
in interface LoadFunc
fileName
- the name of the file to be readis
- the stream representing the file to be processed, and which can also provide its position.offset
- the offset to start reading tuples.end
- the ending offset for reading.
IOException
public DataBag bytesToBag(byte[] b) throws IOException
LoadFunc
bytesToBag
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public String bytesToCharArray(byte[] b) throws IOException
LoadFunc
bytesToCharArray
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Double bytesToDouble(byte[] b) throws IOException
LoadFunc
bytesToDouble
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Float bytesToFloat(byte[] b) throws IOException
LoadFunc
bytesToFloat
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Integer bytesToInteger(byte[] b) throws IOException
LoadFunc
bytesToInteger
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Long bytesToLong(byte[] b) throws IOException
LoadFunc
bytesToLong
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Map<String,Object> bytesToMap(byte[] b) throws IOException
LoadFunc
bytesToMap
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Tuple bytesToTuple(byte[] b) throws IOException
LoadFunc
bytesToTuple
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Schema determineSchema(String fileName, ExecType execType, DataStorage storage) throws IOException
LoadFunc
determineSchema
in interface LoadFunc
fileName
- Name of the file to be read.(this will be the same as the filename
in the "load statement of the script)execType
- - execution mode of the pig script - one of ExecType.LOCAL or ExecType.MAPREDUCEstorage
- - the DataStorage object corresponding to the execType
IOException
public void fieldsToRead(Schema schema)
LoadFunc
fieldsToRead
in interface LoadFunc
schema
- Schema indicating which columns will be needed.public Tuple getNext() throws IOException
LoadFunc
getNext
in interface LoadFunc
IOException
public void computeSamples(ArrayList<Pair<FileSpec,Boolean>> inputs, PigContext pc) throws ExecException
ExecException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |