Thursday, January 24, 2013

ElephantBird now enables Hello World from Pig

In learning Apache Pig, I was surprised at how difficult it is to write "Hello World." From http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Constants I would have thought the code below would have been legal:
-- Illegal Pig syntax
A = {('Hello'),('World')};
DUMP A;

However, it produces syntax errors.  The only way to create a relation -- the basic "variable" in Pig -- is through LOADing a file. As a "language lawyer" within whatever team I was in dating back to the K&R era, it bothers me that a "Hello World" program is impossible to write within a single code file. It makes it seem like Pig Latin is an incomplete language.

The people at Twitter's ElephantBird project have come up with a custom solution in response to my request on the Pig User mailing list.
http://mail-archives.apache.org/mod_mbox/pig-user/201301.mbox/%3CCAE7pYjZtwuxYZs6Ov54P-6SFRCkKPuL9Jwac9i-Rr%2BYsdhasNw%40mail.gmail.com%3E

This ElephantBird Java class allows converting what normally would be the filename specified with the LOAD command into a tuple. A hack. That works. But not without invoking code not distributed with Pig and not without ugliness.
languages = load 'en,fr,jp' using LocationAsTuple(',');

No comments: