|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Pig is a platform for a data flow programming on large data sets in a parallel environment. It consists of a language to specify these programs, Pig Latin, a compiler for this language, and an execution engine to execute the programs.
Pig currently runs on the hadoop platform, reading data from and writing data to hdfs, and doing processing via one or more map-reduce jobs.
Pig's design is guided by our pig philosophy and by our experience with similar data processing systems.
Pig shares many similarities with a traditional RDBMS design. It has a parser, type checker, optimizer, and operators that perform the data processing. However, there are some significant differences. Pig does not have a data catalog, there are no transactions, pig does not directly manage data storage, nor does it implement the execution framework.
LogicalPlan
that defines how
the script will be executed.
Once a LogicalPlan has been generated, the backend of Pig handles executing the script. Pig supports multiple different backend implementations, in order to allow Pig to run on different systems. Currently pig comes with two backends, Map-Reduce and local. For a given run, pig selects the backend to use via configuration.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |