Uses of Interface
org.apache.pig.LoadMetadata

Packages that use LoadMetadata
org.apache.hadoop.zebra.pig Implementation of PIG Storer/Loader Interfaces 
org.apache.pig.builtin   
org.apache.pig.piggybank.storage   
 

Uses of LoadMetadata in org.apache.hadoop.zebra.pig
 

Classes in org.apache.hadoop.zebra.pig that implement LoadMetadata
 class TableLoader
          Pig IndexableLoadFunc and Slicer for Zebra Table
 

Uses of LoadMetadata in org.apache.pig.builtin
 

Classes in org.apache.pig.builtin that implement LoadMetadata
 class BinStorage
           
 

Uses of LoadMetadata in org.apache.pig.piggybank.storage
 

Classes in org.apache.pig.piggybank.storage that implement LoadMetadata
 class HiveColumnarLoader
          Loader for Hive RC Columnar files.
Supports the following types:
* Hive TypePig Type from DataType stringCHARARRAY intINTEGER bigint or longLONG floatfloat doubleDOUBLE booleanBOOLEAN byteBYTE arrayTUPLE mapMAP
Usage 1:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map"); -- to reference the fields b = FOREACH GENERATE a.uid, a.ts, a.arr, a.m;

Usage 2:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP only processing dates 2009-10-01 to 2009-10-02 in a
date partitioned hive table.
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "2009-10-01:2009-10-02"); -- to reference the fields b = FOREACH GENERATE a.uid, a.ts, a.arr, a.m;

Usage 3:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP only reading column uid and ts.
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "", "uid,ts"); -- to reference the fields b = FOREACH a GENERATE uid, ts, arr, m;

Usage 4:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP only reading column uid and ts for dates 2009-10-01 to 2009-10-02.
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "2009-10-01:2009-10-02", "uid,ts"); -- to reference the fields b = FOREACH a GENERATE uid, ts, arr, m;

Issues

Table schema definition
The schema definition must be column name followed by a space then a comma then no space and the next column name and so on.
This so column1 string, column2 string will not word, it must be column1 string,column2 string

Date partitioning
Hive date partition folders must have format daydate=[date].

 class JsonMetadata
          Reads and Writes metadata using JSON in metafiles next to the data.
 class PigStorageSchema
          This Load/Store Func reads/writes metafiles that allow the schema and aliases to be determined at load time, saving one from having to manually enter schemas for pig-generated datasets.
 



Copyright © ${year} The Apache Software Foundation