Thursday, January 24, 2013

ElephantBird now enables Hello World from Pig

In learning Apache Pig, I was surprised at how difficult it is to write "Hello World." From I would have thought the code below would have been legal:
-- Illegal Pig syntax
A = {('Hello'),('World')};

However, it produces syntax errors.  The only way to create a relation -- the basic "variable" in Pig -- is through LOADing a file. As a "language lawyer" within whatever team I was in dating back to the K&R era, it bothers me that a "Hello World" program is impossible to write within a single code file. It makes it seem like Pig Latin is an incomplete language.

The people at Twitter's ElephantBird project have come up with a custom solution in response to my request on the Pig User mailing list.

This ElephantBird Java class allows converting what normally would be the filename specified with the LOAD command into a tuple. A hack. That works. But not without invoking code not distributed with Pig and not without ugliness.
languages = load 'en,fr,jp' using LocationAsTuple(',');

