Tuesday, May 28, 2013

New in Hive 0.11: Windowing and analytics functions

It's not getting a lot of press, but Hive 0.11, released May 15, 2013, introduced a ton of new keywords. Hive 0.11 supports the SQL analytic functions similar to what popular RDBMS's introduced a decade ago with the SQL2003 standard, with the following new Hive keywords:

  • LEAD
  • LAG
  • FIRST_VALUE
  • LAST_VALUE
  • OVER
  • WINDOW
  • UNBOUNDED
  • PRECEDING
  • FOLLOWING
  • RANK
  • ROW_NUMBER
  • DENSE_RANK
  • CUME_DIST
  • PERCENT_RANK
  • NTILE

Sadly, the documentation is scanty, to say the least. For now, documentation and tutorials (e.g. sqlatoms.com) on equivalent keywords in Oracle and SQL Server, for example, will have to suffice.

Being able to window obviates some of the need for subqueries (specifically, pre-SQL2003 subqueries that just provided a limited GROUP BY in a separate SELECT column), which of course Hive does not support.

No comments: