|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface LoadFunc
This interface is used to implement functions to parse records from a dataset. This also includes functions to cast raw byte data into various datatypes. These are external functions because we want loaders, whenever possible, to delay casting of datatypes until the last possible moment (i.e. don't do it on load). This means we need to expose the functionality so that other sections of the code can call back to the loader to do the cast.
Method Summary | |
---|---|
void |
bindTo(String fileName,
BufferedPositionedInputStream is,
long offset,
long end)
Specifies a portion of an InputStream to read tuples. |
DataBag |
bytesToBag(byte[] b)
Cast data from bytes to bag value. |
String |
bytesToCharArray(byte[] b)
Cast data from bytes to chararray value. |
Double |
bytesToDouble(byte[] b)
Cast data from bytes to double value. |
Float |
bytesToFloat(byte[] b)
Cast data from bytes to float value. |
Integer |
bytesToInteger(byte[] b)
Cast data from bytes to integer value. |
Long |
bytesToLong(byte[] b)
Cast data from bytes to long value. |
Map<Object,Object> |
bytesToMap(byte[] b)
Cast data from bytes to map value. |
Tuple |
bytesToTuple(byte[] b)
Cast data from bytes to tuple value. |
Schema |
determineSchema(String fileName,
ExecType execType,
DataStorage storage)
Find the schema from the loader. |
void |
fieldsToRead(Schema schema)
Indicate to the loader fields that will be needed. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
Method Detail |
---|
void bindTo(String fileName, BufferedPositionedInputStream is, long offset, long end) throws IOException
A common way of handling slices in the middle of records is to start at the given offset and, if the offset is not zero, skip to the end of the first record (which may be a partial record) before reading tuples. Reading continues until a tuple has been read that ends at an offset past the ending offset.
The load function should not do any buffering on the input stream. Buffering will cause the offsets returned by is.getPos() to be unreliable.
fileName
- the name of the file to be readis
- the stream representing the file to be processed, and which can also provide its position.offset
- the offset to start reading tuples.end
- the ending offset for reading.
IOException
Tuple getNext() throws IOException
IOException
Integer bytesToInteger(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.Long bytesToLong(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.Float bytesToFloat(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.Double bytesToDouble(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.String bytesToCharArray(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.Map<Object,Object> bytesToMap(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.Tuple bytesToTuple(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.DataBag bytesToBag(byte[] b) throws IOException
b
- byte array to be cast.
IOException
- if the value cannot be cast.void fieldsToRead(Schema schema)
schema
- Schema indicating which columns will be needed.Schema determineSchema(String fileName, ExecType execType, DataStorage storage) throws IOException
fileName
- Name of the file to be read.(this will be the same as the filename
in the "load statement of the script)execType
- - execution mode of the pig script - one of ExecType.LOCAL or ExecType.MAPREDUCEstorage
- - the DataStorage object corresponding to the execType
IOException.
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |