Uses of Class
org.apache.pig.LoadFunc

Packages that use LoadFunc
org.apache.hadoop.zebra.pig Implementation of PIG Storer/Loader Interfaces 
org.apache.pig Public interfaces and classes for Pig. 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer   
org.apache.pig.backend.hadoop.hbase   
org.apache.pig.builtin   
org.apache.pig.experimental.logical.relational   
org.apache.pig.impl.builtin   
org.apache.pig.impl.io   
org.apache.pig.impl.logicalLayer The logical operators that represent a pig script and tools for manipulating those operators. 
org.apache.pig.piggybank.storage   
org.apache.pig.piggybank.storage.apachelog   
 

Uses of LoadFunc in org.apache.hadoop.zebra.pig
 

Subclasses of LoadFunc in org.apache.hadoop.zebra.pig
 class TableLoader
          Pig IndexableLoadFunc and Slicer for Zebra Table
 

Uses of LoadFunc in org.apache.pig
 

Subclasses of LoadFunc in org.apache.pig
 class FileInputLoadFunc
          This class provides an implementation of OrderedLoadFunc interface which can be optionally re-used by LoadFuncs that use FileInputFormat, by having this as a super class
 

Uses of LoadFunc in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
 

Subclasses of LoadFunc in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
 class MergeJoinIndexer
          Merge Join indexer is used to generate on the fly index for doing Merge Join efficiently.
 

Constructors in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer with parameters of type LoadFunc
PigRecordReader(org.apache.hadoop.mapreduce.RecordReader wrappedReader, LoadFunc loadFunc, org.apache.hadoop.conf.Configuration conf)
           
 

Uses of LoadFunc in org.apache.pig.backend.hadoop.hbase
 

Subclasses of LoadFunc in org.apache.pig.backend.hadoop.hbase
 class HBaseStorage
          A Hbase Loader
 

Uses of LoadFunc in org.apache.pig.builtin
 

Subclasses of LoadFunc in org.apache.pig.builtin
 class BinStorage
           
 class PigStorage
          A load function that parses a line of input into fields using a delimiter to set the fields.
 class TextLoader
          This load function simply creates a tuple for each line of text that has a single field that contains the line of text.
 

Uses of LoadFunc in org.apache.pig.experimental.logical.relational
 

Methods in org.apache.pig.experimental.logical.relational that return LoadFunc
 LoadFunc LOLoad.getLoadFunc()
           
 

Uses of LoadFunc in org.apache.pig.impl.builtin
 

Subclasses of LoadFunc in org.apache.pig.impl.builtin
 class DefaultIndexableLoader
          Used by MergeJoin .
 class PoissonSampleLoader
          See "Skewed Join sampler" in http://wiki.apache.org/pig/PigSampler
 class RandomSampleLoader
          A loader that samples the data.
 class SampleLoader
          Abstract class that specifies the interface for sample loaders
 

Fields in org.apache.pig.impl.builtin declared as LoadFunc
protected  LoadFunc SampleLoader.loader
           
 

Uses of LoadFunc in org.apache.pig.impl.io
 

Subclasses of LoadFunc in org.apache.pig.impl.io
 class ReadToEndLoader
          This is wrapper Loader which wraps a real LoadFunc underneath and allows to read a file completely starting a given split (indicated by a split index which is used to look in the List returned by the underlying InputFormat's getSplits() method).
 

Methods in org.apache.pig.impl.io with parameters of type LoadFunc
 DataBag PigFile.load(LoadFunc lfunc, PigContext pigContext)
           
 

Constructors in org.apache.pig.impl.io with parameters of type LoadFunc
ReadToEndLoader(LoadFunc wrappedLoadFunc, org.apache.hadoop.conf.Configuration conf, String inputLocation, int splitIndex)
           
ReadToEndLoader(LoadFunc wrappedLoadFunc, org.apache.hadoop.conf.Configuration conf, String inputLocation, int[] toReadSplitIdxs)
          This constructor takes an array of split indexes (toReadSplitIdxs) of the splits to be read.
 

Uses of LoadFunc in org.apache.pig.impl.logicalLayer
 

Methods in org.apache.pig.impl.logicalLayer that return LoadFunc
 LoadFunc LOLoad.getLoadFunc()
           
 

Uses of LoadFunc in org.apache.pig.piggybank.storage
 

Subclasses of LoadFunc in org.apache.pig.piggybank.storage
 class HiveColumnarLoader
          Loader for Hive RC Columnar files.
Supports the following types:
* Hive TypePig Type from DataType stringCHARARRAY intINTEGER bigint or longLONG floatfloat doubleDOUBLE booleanBOOLEAN byteBYTE arrayTUPLE mapMAP
Usage 1:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map"); -- to reference the fields b = FOREACH GENERATE a.uid, a.ts, a.arr, a.m;

Usage 2:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP only processing dates 2009-10-01 to 2009-10-02 in a
date partitioned hive table.
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "2009-10-01:2009-10-02"); -- to reference the fields b = FOREACH GENERATE a.uid, a.ts, a.arr, a.m;

Usage 3:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP only reading column uid and ts.
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "", "uid,ts"); -- to reference the fields b = FOREACH a GENERATE uid, ts, arr, m;

Usage 4:
To load a hive table: uid bigint, ts long, arr ARRAY, m MAP only reading column uid and ts for dates 2009-10-01 to 2009-10-02.
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "2009-10-01:2009-10-02", "uid,ts"); -- to reference the fields b = FOREACH a GENERATE uid, ts, arr, m;

Issues

Table schema definition
The schema definition must be column name followed by a space then a comma then no space and the next column name and so on.
This so column1 string, column2 string will not word, it must be column1 string,column2 string

Date partitioning
Hive date partition folders must have format daydate=[date].

 class MyRegExLoader
           
 class PigStorageSchema
          This Load/Store Func reads/writes metafiles that allow the schema and aliases to be determined at load time, saving one from having to manually enter schemas for pig-generated datasets.
 class RegExLoader
          RegExLoader is an abstract class used to parse logs based on a regular expression.
 class SequenceFileLoader
          A Loader for Hadoop-Standard SequenceFiles.
 class XMLLoader
          The load function to load the XML file This implements the LoadFunc interface which is used to parse records from a dataset.
 

Uses of LoadFunc in org.apache.pig.piggybank.storage.apachelog
 

Subclasses of LoadFunc in org.apache.pig.piggybank.storage.apachelog
 class CombinedLogLoader
          CombinedLogLoader is used to load logs based on Apache's combined log format, based on a format like LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined The log filename ends up being access_log from a line like CustomLog logs/combined_log combined Example: raw = LOAD 'combined_log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader AS (remoteAddr, remoteLogname, user, time, method, uri, proto, status, bytes, referer, userAgent);
 class CommonLogLoader
          CommonLogLoader is used to load logs based on Apache's common log format, based on a format like LogFormat "%h %l %u %t \"%r\" %>s %b" common The log filename ends up being access_log from a line like CustomLog logs/access_log common Example: raw = LOAD 'access_log' USING org.apache.pig.piggybank.storage.apachelog.CommongLogLoader AS (remoteAddr, remoteLogname, user, time, method, uri, proto, bytes);
 



Copyright © ${year} The Apache Software Foundation