org.apache.hadoop.hive.ql.io
Class RCFile

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.RCFile

public class RCFile
extends Object

RCFiles, short of Record Columnar File, are flat files consisting of binary key/value pairs, which shares much similarity with SequenceFile. RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part. When writing, RCFile.Writer first holds records' value bytes in memory, and determines a row split if the raw bytes size of buffered records overflow a given parameterWriter.columnsBufferSize, which can be set like: conf.setInt(COLUMNS_BUFFER_SIZE_CONF_STR, 4 * 1024 * 1024) .

RCFile provides RCFile.Writer, RCFile.Reader and classes for writing, reading respectively.

RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part.

RCFile compresses values in a more fine-grained manner then record level compression. However, It currently does not support compress the key part yet. The actual compression algorithm used to compress key and/or values can be specified by using the appropriate CompressionCodec.

The RCFile.Reader is used to read and explain the bytes of RCFile.

RCFile Formats

RCFile Format


Nested Class Summary
static class RCFile.Reader
          Read KeyBuffer/ValueBuffer pairs from a RCFile.
static class RCFile.Writer
          Write KeyBuffer/ValueBuffer pairs to a RCFile.
 
Field Summary
static String COLUMN_NUMBER_CONF_STR
           
static String COLUMN_NUMBER_METADATA_STR
           
static String RECORD_INTERVAL_CONF_STR
           
static int SYNC_INTERVAL
          The number of bytes between sync points.
 
Constructor Summary
RCFile()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RECORD_INTERVAL_CONF_STR

public static final String RECORD_INTERVAL_CONF_STR
See Also:
Constant Field Values

COLUMN_NUMBER_METADATA_STR

public static final String COLUMN_NUMBER_METADATA_STR
See Also:
Constant Field Values

COLUMN_NUMBER_CONF_STR

public static final String COLUMN_NUMBER_CONF_STR
See Also:
Constant Field Values

SYNC_INTERVAL

public static final int SYNC_INTERVAL
The number of bytes between sync points.

See Also:
Constant Field Values
Constructor Detail

RCFile

public RCFile()


Copyright © 2010 The Apache Software Foundation