Package org.apache.tajo.storage.parquet

Provides read and write support for Parquet files.

See: Description

Package org.apache.tajo.storage.parquet Description

Provides read and write support for Parquet files. Tajo schemas are converted to Parquet schemas according to the following mapping of Tajo and Parquet types:

Tajo type Parquet type
NULL_TYPE No type. The field is not encoded in Parquet.
BOOLEAN BOOLEAN
BIT INT32
INT2 INT32
INT4 INT32
INT8 INT64
FLOAT4 FLOAT
FLOAT8 DOUBLE
CHAR BINARY (with OriginalType UTF8)
TEXT BINARY (with OriginalType UTF8)
PROTOBUF BINARY
BLOB BINARY
INET4 BINARY

Because Tajo fields can be NULL, all Parquet fields are marked as optional.

The conversion from Tajo to Parquet is lossy without the original Tajo schema. As a result, Parquet files are read using the Tajo schema saved in the Tajo catalog for the table the Parquet files belong to, which was defined when the table was created.

Copyright © 2014 Apache Software Foundation. All Rights Reserved.