Tip 1: INSERT
While querying is straightforward using the dot notation familiar from Java, etc.SELECT z.a FROM oc;INSERTing requires use of the peculiar named_struct:
echo 1,2 >twovalues.csv hive CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc; CREATE TABLE oc (z STRUCT<a: int, b: int>); INSERT INTO TABLE oc SELECT named_struct('a', x, 'b', y) FROM tc;
Tip 2: NULLs
HiveQL DDL has no option in its CREATE TABLE syntax to specify that a column be NULLable. That's because in the Hadoop/HDFS/Big Data philosophy, data is in general allowed to be missing, and that translates nicely to the SQL concept of NULL in Hive/HiveQL.However, that doesn't extend to STRUCTs in HiveQL. Apparently neither columns within STRUCTs nor the STRUCTs themselves can be NULLs.
Thus there is this odd restriction: while all regular Hive columns must be NULLable, all STRUCT Hive columns must be non-NULLable. That's quite a polarization of behavior for a DDL that doesn't even have a keyword "NULL".
The following:
INSERT INTO TABLE oc SELECT null FROM tc;produces the error
FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.And the following:
INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;produces the error:
FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.
1 comment:
You can actually do this, it's just a major pain in the back side:
INSERT INTO TABLE oc SELECT named_struct('a', cast(null as int), 'b', cast(null as int)) FROM tc;
Post a Comment