Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho.

Similar presentations


Presentation on theme: "Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho."— Presentation transcript:

1 Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho

2 2 © 2014 Cloudera, Inc. All rights reserved. Overview Collaboration with Impala team, work to run against Hive Automates generation of test cases, solves: Humans can only generate so many test queries Humans focus on positive queries (what about machine-generated queries) Idea is to have two databases: test (Hive, Impala) and reference database (Postgres, Mysql, Oracle) Generate random data, issue random queries against both

3 3 © 2014 Cloudera, Inc. All rights reserved. Data Generator Table-count (max, min) Column-count (max, min) Row-count (max, min) Column Data Types BooleanFloat TinyIntDecimal(r_precision, r_scale) SmallIntChar(r_length) BigIntVarchar(r_length) DoubleTimestamp

4 4 © 2014 Cloudera, Inc. All rights reserved. Query Generator 1. Generate QueryModel based on QueryProfile 2. ModelTranslator to translate from Model to database’s SQL dialect 3. Execute the SQL on via DbConnectors 4. Result comparison (sort if unsorted) QueryModel HiveProfile ImpalaProfile HiveTranslator PostgresTranslator “Test databases” MysqlTranslator HiveQL SQL (Postgres dialect) SQL (Postgres dialect) SQL (Mysql dialect) SQL (Mysql dialect) “Reference databases”

5 5 © 2014 Cloudera, Inc. All rights reserved. Query Model, High Level Query Clause Constant/ColFuncsTableExpr Represent valid SQL query Query consist of one or more clause (from, select, group-by, union) Clause has one or more expressions (constants, columns, functions of columns, tables), different for different clause types Model is Recursive in nature: Funcs can be run on output of other funcs Union clause can contain another query Some boolean funcs can contain subquery

6 6 © 2014 Cloudera, Inc. All rights reserved. Query Model, Funcs Func types: Boolean funcs (isnull, and, or, in, =, !=, >, <) Subquery funcs (exists, not exists, in, not in): May contain another Query Val funcs (Trim, Length, Concat, Add, Abs, Floor, Ceil, Greatest, Least, etc) Agg funcs (Eg, Max, Min, Sum, Avg, Count) Analytic Funcs (Rank, DenseRank, RowNumber, Lead, Lag, FirstValue, LastValue, Max, Min, etc..) Window specification (“Rows between x and y”, “rows unbounded preceding”, etc) PartitionByClause (“over (partition by x)”) OrderByClause Rules to determine where to use a func, based on func type and return type

7 7 © 2014 Cloudera, Inc. All rights reserved. QueryModel: Clauses QueryModel WithClause SelectClause FromClause: Table Expression WhereClause: Predicate (Boolean expr) GroupByClause: if Select (Basic or AggFunc) HavingClause: if Select (AggFunc) Predicate (Boolean expr) UnionClause (Query) OrderByClause LimitClause SelectClause, List of Expr’s: Constant Col Val Funcs AggFunc AnalyticFunc Window PartitionByClause OrderByClause WithClause: Adds a table expression: “With bar as (select * from foo) select * from bar; GroupByClause, List of: Constant Col OrderByClause, List of: Constant Col Func

8 8 © 2014 Cloudera, Inc. All rights reserved. QueryModel: Joins QueryModel WithClause SelectClause FromClause: Multiple table expressions JoinClause (define table relationship) WhereClause: Predicate (Boolean function, using expr from tables in JoinClause) GroupByClause HavingClause JoinClause Types: Inner Left Right Left semi Right semi Right anti Full outer Cross

9 9 © 2014 Cloudera, Inc. All rights reserved. Demo

10 10 © 2014 Cloudera, Inc. All rights reserved. Results 1: HiveQL Discrepancies Language Deficiences (as of Hive 1.1) Support “Interval” for date arithemetic operations: date + INTERVAL expr unit With {…} cannot be used in subquery Having must have a group by Cannot sort by two expressions in window function, unless window specified Negative lag or lead amount not allowed Only “Union all” and not “Union” (since fixed) Null Ordering Hive lacks specifying null order (opposite of Postgres)

11 11 © 2014 Cloudera, Inc. All rights reserved. Results 2: JIRA’s so far Many valid issues found, fixed since 1.1 HIVE-12082 : Null comparison for greatest and least operator HIVE-12070 : Relax type restrictions on ‘Greatest’ and ‘Least’ HIVE-11737: IndexOutOfBounds compiling query with duplicated groupby keys HIVE-11712: Duplicate groupby keys cause ClassCastException HIVE-11835: Type decimal(1,1) reads 0.0, 0.00, etc from text file as NULL HIVE-12296 : ClassCastException when selecting constant in inner select (pending)

12 12 © 2014 Cloudera, Inc. All rights reserved. Going Forward Tackle non-SQL-92 query-support Nested Types Partitioned tables Multi-insert

13 Thank you.


Download ppt "Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho."

Similar presentations


Ads by Google