org.apache.pig.builtin
Class BinStorage

java.lang.Object
  extended by org.apache.pig.builtin.BinStorage
All Implemented Interfaces:
LoadFunc, ReversibleLoadStoreFunc, StoreFunc
Direct Known Subclasses:
RandomSampleLoader

public class BinStorage
extends Object
implements ReversibleLoadStoreFunc


Field Summary
protected  long end
           
protected  BufferedPositionedInputStream in
           
static byte RECORD_1
           
static byte RECORD_2
           
static byte RECORD_3
           
 
Constructor Summary
BinStorage()
          Simple binary nested reader format
 
Method Summary
 void bindTo(OutputStream os)
          Specifies the OutputStream to write to.
 void bindTo(String fileName, BufferedPositionedInputStream in, long offset, long end)
          Specifies a portion of an InputStream to read tuples.
 DataBag bytesToBag(byte[] b)
          Cast data from bytes to bag value.
 String bytesToCharArray(byte[] b)
          Cast data from bytes to chararray value.
 Double bytesToDouble(byte[] b)
          Cast data from bytes to double value.
 Float bytesToFloat(byte[] b)
          Cast data from bytes to float value.
 Integer bytesToInteger(byte[] b)
          Cast data from bytes to integer value.
 Long bytesToLong(byte[] b)
          Cast data from bytes to long value.
 Map<Object,Object> bytesToMap(byte[] b)
          Cast data from bytes to map value.
 Tuple bytesToTuple(byte[] b)
          Cast data from bytes to tuple value.
 Schema determineSchema(String fileName, ExecType execType, DataStorage storage)
          Find the schema from the loader.
 boolean equals(Object obj)
           
 void fieldsToRead(Schema schema)
          Indicate to the loader fields that will be needed.
 void finish()
          Do any kind of post processing because the last tuple has been stored.
 Tuple getNext()
          Retrieves the next tuple to be processed.
 Class getStorePreparationClass()
          Specify a backend specific class to use to prepare for storing output.
 void putNext(Tuple t)
          Write a tuple the output stream to which this instance was previously bound.
 byte[] toBytes(DataBag bag)
           
 byte[] toBytes(Double d)
           
 byte[] toBytes(Float f)
           
 byte[] toBytes(Integer i)
           
 byte[] toBytes(Long l)
           
 byte[] toBytes(Map<Object,Object> m)
           
 byte[] toBytes(String s)
           
 byte[] toBytes(Tuple t)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RECORD_1

public static final byte RECORD_1
See Also:
Constant Field Values

RECORD_2

public static final byte RECORD_2
See Also:
Constant Field Values

RECORD_3

public static final byte RECORD_3
See Also:
Constant Field Values

in

protected BufferedPositionedInputStream in

end

protected long end
Constructor Detail

BinStorage

public BinStorage()
Simple binary nested reader format

Method Detail

getNext

public Tuple getNext()
              throws IOException
Description copied from interface: LoadFunc
Retrieves the next tuple to be processed.

Specified by:
getNext in interface LoadFunc
Returns:
the next tuple to be processed or null if there are no more tuples to be processed.
Throws:
IOException

bindTo

public void bindTo(String fileName,
                   BufferedPositionedInputStream in,
                   long offset,
                   long end)
            throws IOException
Description copied from interface: LoadFunc
Specifies a portion of an InputStream to read tuples. Because the starting and ending offsets may not be on record boundaries it is up to the implementor to deal with figuring out the actual starting and ending offsets in such a way that an arbitrarily sliced up file will be processed in its entirety.

A common way of handling slices in the middle of records is to start at the given offset and, if the offset is not zero, skip to the end of the first record (which may be a partial record) before reading tuples. Reading continues until a tuple has been read that ends at an offset past the ending offset.

The load function should not do any buffering on the input stream. Buffering will cause the offsets returned by is.getPos() to be unreliable.

Specified by:
bindTo in interface LoadFunc
Parameters:
fileName - the name of the file to be read
in - the stream representing the file to be processed, and which can also provide its position.
offset - the offset to start reading tuples.
end - the ending offset for reading.
Throws:
IOException

bindTo

public void bindTo(OutputStream os)
            throws IOException
Description copied from interface: StoreFunc
Specifies the OutputStream to write to. This will be called before store(Tuple) is invoked.

Specified by:
bindTo in interface StoreFunc
Parameters:
os - The stream to write tuples to.
Throws:
IOException

finish

public void finish()
            throws IOException
Description copied from interface: StoreFunc
Do any kind of post processing because the last tuple has been stored. DO NOT CLOSE THE STREAM in this method. The stream will be closed later outside of this function.

Specified by:
finish in interface StoreFunc
Throws:
IOException

putNext

public void putNext(Tuple t)
             throws IOException
Description copied from interface: StoreFunc
Write a tuple the output stream to which this instance was previously bound.

Specified by:
putNext in interface StoreFunc
Parameters:
t - the tuple to store.
Throws:
IOException

bytesToBag

public DataBag bytesToBag(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to bag value.

Specified by:
bytesToBag in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Bag value.

bytesToCharArray

public String bytesToCharArray(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to chararray value.

Specified by:
bytesToCharArray in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
String value.

bytesToDouble

public Double bytesToDouble(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to double value.

Specified by:
bytesToDouble in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Double value.

bytesToFloat

public Float bytesToFloat(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to float value.

Specified by:
bytesToFloat in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Float value.

bytesToInteger

public Integer bytesToInteger(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to integer value.

Specified by:
bytesToInteger in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Integer value.

bytesToLong

public Long bytesToLong(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to long value.

Specified by:
bytesToLong in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Long value.

bytesToMap

public Map<Object,Object> bytesToMap(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to map value.

Specified by:
bytesToMap in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Map value.

bytesToTuple

public Tuple bytesToTuple(byte[] b)
Description copied from interface: LoadFunc
Cast data from bytes to tuple value.

Specified by:
bytesToTuple in interface LoadFunc
Parameters:
b - byte array to be cast.
Returns:
Tuple value.

determineSchema

public Schema determineSchema(String fileName,
                              ExecType execType,
                              DataStorage storage)
                       throws IOException
Description copied from interface: LoadFunc
Find the schema from the loader. This function will be called at parse time (not run time) to see if the loader can provide a schema for the data. The loader may be able to do this if the data is self describing (e.g. JSON). If the loader cannot determine the schema, it can return a null. LoadFunc implementations which need to open the input "fileName", can use FileLocalizer.open(String fileName, ExecType execType, DataStorage storage) to get an InputStream which they can use to initialize their loader implementation. They can then use this to read the input data to discover the schema. Note: this will work only when the fileName represents a file on Local File System or Hadoop file system

Specified by:
determineSchema in interface LoadFunc
Parameters:
fileName - Name of the file to be read.(this will be the same as the filename in the "load statement of the script)
execType - - execution mode of the pig script - one of ExecType.LOCAL or ExecType.MAPREDUCE
storage - - the DataStorage object corresponding to the execType
Returns:
a Schema describing the data if possible, or null otherwise.
Throws:
IOException

fieldsToRead

public void fieldsToRead(Schema schema)
Description copied from interface: LoadFunc
Indicate to the loader fields that will be needed. This can be useful for loaders that access data that is stored in a columnar format where indicating columns to be accessed a head of time will save scans. If the loader function cannot make use of this information, it is free to ignore it.

Specified by:
fieldsToRead in interface LoadFunc
Parameters:
schema - Schema indicating which columns will be needed.

toBytes

public byte[] toBytes(DataBag bag)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(String s)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Double d)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Float f)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Integer i)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Long l)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Map<Object,Object> m)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Tuple t)
               throws IOException
Throws:
IOException

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

getStorePreparationClass

public Class getStorePreparationClass()
                               throws IOException
Description copied from interface: StoreFunc
Specify a backend specific class to use to prepare for storing output. In the Hadoop case, this can return an OutputFormat that will be used instead of PigOutputFormat. The framework will call this function and if a Class is returned that implements OutputFormat it will be used. For more details on how the OutputFormat should interact with Pig, see PigOutputFormat.getRecordWriter(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.mapred.JobConf, String, org.apache.hadoop.util.Progressable)

Specified by:
getStorePreparationClass in interface StoreFunc
Returns:
Backend specific class used to prepare for storing output. If the StoreFunc implementation does not have a class to prepare for storing output, it can return null and a default Pig implementation will be used to prepare for storing output.
Throws:
IOException - if the class does not implement the expected interface(s).


Copyright © ${year} The Apache Software Foundation