Thursday, July 18, 2013

SERDEPROPERTIES required for Avro in Hive 0.11

To specify an Avro-backed Hive table, the Apache Wiki and the Cloudera Avro documentation both prescribe specifying the Avro schema in TBLPROPERTIES. This is no longer supported in Hive 0.11. It is now necessary to use SERDEPROPERTIES:

CREATE EXTERNAL TABLE mytable
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
  'avro.schema.url'='hdfs:///user/cloudera/mytable.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

Otherwise if TBLPROPERTIES is used to specify the location of the Avro schema, Hive 0.11 won't be able to find it and the following exception will be thrown:

java.io.IOException(org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema)

No comments: