|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.impl.builtin.MergeJoinIndexer
public class MergeJoinIndexer
Merge Join indexer is used to generate on the fly index for doing Merge Join efficiently. It samples first record from every block of right side input. and returns tuple in the following format : (key0, key1,...,fileName, offset) These tuples are then sorted before being written out to index file on HDFS.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.apache.pig.LoadFunc |
---|
LoadFunc.RequiredField, LoadFunc.RequiredFieldList, LoadFunc.RequiredFieldResponse |
Constructor Summary | |
---|---|
MergeJoinIndexer(String funcSpec,
String innerPlan,
String serializedPhyPlan)
|
Method Summary | |
---|---|
void |
bindTo(String fileName,
BufferedPositionedInputStream is,
long offset,
long end)
Specifies a portion of an InputStream to read tuples. |
DataBag |
bytesToBag(byte[] b)
Cast data from bytes to bag value. |
String |
bytesToCharArray(byte[] b)
Cast data from bytes to chararray value. |
Double |
bytesToDouble(byte[] b)
Cast data from bytes to double value. |
Float |
bytesToFloat(byte[] b)
Cast data from bytes to float value. |
Integer |
bytesToInteger(byte[] b)
Cast data from bytes to integer value. |
Long |
bytesToLong(byte[] b)
Cast data from bytes to long value. |
Map<String,Object> |
bytesToMap(byte[] b)
Cast data from bytes to map value. |
Tuple |
bytesToTuple(byte[] b)
Cast data from bytes to tuple value. |
Schema |
determineSchema(String fileName,
ExecType execType,
DataStorage storage)
Find the schema from the loader. |
LoadFunc.RequiredFieldResponse |
fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)
Indicate to the loader fields that will be needed. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public MergeJoinIndexer(String funcSpec, String innerPlan, String serializedPhyPlan) throws ExecException
funcSpec
- : Loader specification.innerPlan
- : This is serialized version of LR plan. We
want to keep only keys in our index file and not the whole tuple. So, we need LR and thus its plan
to get keys out of the sampled tuple.serializedPhyPlan
- Serialized physical plan on right side.
ExecException
Method Detail |
---|
public void bindTo(String fileName, BufferedPositionedInputStream is, long offset, long end) throws IOException
LoadFunc
A common way of handling slices in the middle of records is to start at the given offset and, if the offset is not zero, skip to the end of the first record (which may be a partial record) before reading tuples. Reading continues until a tuple has been read that ends at an offset past the ending offset.
The load function should not do any buffering on the input stream. Buffering will cause the offsets returned by is.getPos() to be unreliable.
bindTo
in interface LoadFunc
fileName
- the name of the file to be readis
- the stream representing the file to be processed, and which can also provide its position.offset
- the offset to start reading tuples.end
- the ending offset for reading.
IOException
public Tuple getNext() throws IOException
LoadFunc
getNext
in interface LoadFunc
IOException
public Integer bytesToInteger(byte[] b) throws IOException
LoadFunc
bytesToInteger
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Long bytesToLong(byte[] b) throws IOException
LoadFunc
bytesToLong
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Float bytesToFloat(byte[] b) throws IOException
LoadFunc
bytesToFloat
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Double bytesToDouble(byte[] b) throws IOException
LoadFunc
bytesToDouble
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public String bytesToCharArray(byte[] b) throws IOException
LoadFunc
bytesToCharArray
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Map<String,Object> bytesToMap(byte[] b) throws IOException
LoadFunc
bytesToMap
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public Tuple bytesToTuple(byte[] b) throws IOException
LoadFunc
bytesToTuple
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public DataBag bytesToBag(byte[] b) throws IOException
LoadFunc
bytesToBag
in interface LoadFunc
b
- byte array to be cast.
IOException
- if the value cannot be cast.public LoadFunc.RequiredFieldResponse fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList) throws FrontendException
LoadFunc
fieldsToRead
in interface LoadFunc
requiredFieldList
- RequiredFieldList indicating which columns will be needed.
FrontendException
public Schema determineSchema(String fileName, ExecType execType, DataStorage storage) throws IOException
LoadFunc
determineSchema
in interface LoadFunc
fileName
- Name of the file to be read.(this will be the same as the filename
in the "load statement of the script)execType
- - execution mode of the pig script - one of ExecType.LOCAL or ExecType.MAPREDUCEstorage
- - the DataStorage object corresponding to the execType
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |