Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)
2 Software and Services Group What SQL support is needed? More SQL-92 support for analytics Complete SQL data type system –Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function (CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc. Full subquery support –Subquery in WHERE clauses, correlated subquery, scalar subquery, etc. –New expressions (EXISTS, ALL, ANY, etc.) Complete Set operators –DISTINCT UNION, INTERSECT, EXCEPT, etc. Multiple-table SELECT statement Update/delete? –On HBase only? (Almost) SQL-92 compliance? How about transaction? 2
3 Software and Services Group What SQL support is needed (continued)? Additional analytics support (beyond SQL-92) Advanced OLAP functions for analysis & reporting –E.g., rank, rollup, cube, window function (SQL 2003), etc. Advanced SQL syntax –E.g. WITH clause (SQL-99) Procedural extensions –E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc. 3
4 Software and Services Group Workload Analysis 4 TPC-HTPC-DS Complex SubqueryYY Multiple-table SELECTYY Set operatorsY SQL data types (especially Datetime) YY Advanced OLAP functions (e.g., rank, grouping and window functions) Y WITH clause (SQL-99)Y UPDATE/DELETEY
5 Software and Services Group Let’s Get Our Hands Dirty 5 Parser Semantic Analyzer (Optimizer) Execution Query AST (Abstract Syntax Tree) Execution Plan (Almost) SQL-compliant Hive parser A lot of work: SQL much more complex than HiveQL –HiveQL grammar file: ~61KB with 2487 lines –SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines Also complex: many existing Hive grammar rules need to be changed –To support more complex SQL constructs (e.g., subquery) UDF/UDAF/UDTF For some operators (e.g., rank)
6 Software and Services Group Let’s Get Our Hands Dirty 6 Parser Semantic Analyzer (Optimizer) Execution Query AST (Abstract Syntax Tree) Execution Plan Analysis, transformation & optimization SQL data type system Subquery support (incl. subquery unnestting) Multiple-table SELECT Set operations Advanced OLAP functions …
7 Software and Services Group Project Panthera: Our open source efforts to enable better analytics capabilities on Hadoop/HBase How to Leverage Existing Works? 7 * Hive Parser Hive-AST HiveQL Driver Query (Open Source) SQL Parser* SQL- AST SQL-AST Analyzer & Translator Multi-Table SELECT Subquery Unnesting … Hive Semantic Analyzer INTERSECT Support MINUS Support … Hadoop MR SQL Hive- AST A SQL engine for Hive MapReduce Goal: full analytical SQL support for OLAP Subquery in WHERE clause Correlated subquery Multiple-table SELECT statement …
8 Software and Services Group NextR Hive UDFs UDFs for Oracle db extensions (rank, decode, nvl, etc.) SQL windowing functions for Hive How to Leverage Existing Works? 8
9 Software and Services Group 9