Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho.

Slides:



Advertisements
Similar presentations
Chapter 4 Joining Multiple Tables
Advertisements

N.G.Acharya & D.K.Marathe college Chembur-E, Mumbai-71
 Database is SQL1.mdb ◦ import using MySQL Migration Toolkit 
Database Systems: Design, Implementation, and Management Tenth Edition
Chapter 11 Group Functions
Structure Query Language (SQL) COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 7 Introduction to Structured Query Language (SQL)
Concepts of Database Management Sixth Edition
Computer Science 101 Web Access to Databases SQL – Extended Form.
Introduction to SQL Structured Query Language Martin Egerhill.
©Silberschatz, Korth and Sudarshan4.1Database System Concepts Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries.
ADVANCE T-SQL: WINDOW FUNCTIONS Rahman Wehelie 7/16/2013 ITC 226.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
Oracle Database Administration Lecture 3  Transactions  SQL Language: Additional information  SQL Language: Analytic Functions.
Chapter 9 Joining Data from Multiple Tables
1 CS 430 Database Theory Winter 2005 Lecture 12: SQL DML - SELECT.
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
DATABASE TRANSACTION. Transaction It is a logical unit of work that must succeed or fail in its entirety. A transaction is an atomic operation which may.
CS146 References: ORACLE 9i PROGRAMMING A Primer Rajshekhar Sunderraman
Chapter 4Introduction to Oracle9i: SQL1 Chapter 4 Joining Multiple Tables.
Unit 4 Queries and Joins. Key Concepts Using the SELECT statement Statement clauses Subqueries Multiple table statements Using table pseudonyms Inner.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Mark Inman U.S. Navy (Naval Sea Logistics Center) Session #213 Analytic SQL for Beginners.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
T-SQL: Simple Changes That Go a Long Way DAVE ingeniousSQL.com linkedin.com/in/ingenioussql.
SQL SeQueL -Structured Query Language SQL SQL better support for Algebraic operations SQL Post-Relational row and column types,
IFS Intro to Data Management Chapter 5 Getting More Than Simple Columns.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
SQL LANGUAGE and Relational Data Model TUTORIAL Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
SQL Aggregation Oracle and ANSI Standard SQL Lecture 9.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
05 | SET Operators, Windows Functions, and Grouping Brian Alderman | MCT, CEO / Founder of MicroTechPoint Tobias Ternstrom | Microsoft SQL Server Program.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
IST 210 More SQL Todd Bacastow IST 210: Organization of Data.
In this session, you will learn to: Query data by using joins Query data by using subqueries Objectives.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Background Lots of Demos(That’s it.)
A Glance at the Window Functions. Window Functions Introduced in SQL 2005 Enhanced in SQL 2012 So-called because they operate on a defined portion of.
SQL LANGUAGE TUTORIAL Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
5-1 Copyright © 2004, Oracle. All rights reserved. DISPLAYING DATA FROM MULTIPLE TABLES OUTER JOIN.
Select Complex Queries Database Management Fundamentals LESSON 3.1b.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Subqueries CIS 4301 Lecture Notes Lecture /23/2006.
IFS180 Intro. to Data Management Chapter 10 - Unions.
Data Analysis with SQL Window Functions Adam McDonald IT Architect / Senior SQL Developer Smith Travel
T-SQL: Simple Changes That Go a Long Way
T-SQL: Simple Changes That Go a Long Way
Database Systems Subqueries, Aggregation
Data Analysis with SQL Window Functions
Using Window Ranking, Offset, and Aggregate Functions
Lecture#7: Fun with SQL (Part 2)
SQL Structured Query Language 11/9/2018 Introduction to Databases.
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
SQL : Query Language Part II CS3431.
Chapter # 7 Introduction to Structured Query Language (SQL) Part II.
Chapter 4 Summary Query.
SQL: Structured Query Language
Data Analysis with SQL Window Functions
Contents Preface I Introduction Lesson Objectives I-2
Query Functions.
Shelly Cashman: Microsoft Access 2016
Introduction to SQL Server and the Structure Query Language
Intermediate Query Structure and Development
T-SQL: Simple Changes That Go a Long Way
Presentation transcript:

Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho

2 © 2014 Cloudera, Inc. All rights reserved. Overview Collaboration with Impala team, work to run against Hive Automates generation of test cases, solves: Humans can only generate so many test queries Humans focus on positive queries (what about machine-generated queries) Idea is to have two databases: test (Hive, Impala) and reference database (Postgres, Mysql, Oracle) Generate random data, issue random queries against both

3 © 2014 Cloudera, Inc. All rights reserved. Data Generator Table-count (max, min) Column-count (max, min) Row-count (max, min) Column Data Types BooleanFloat TinyIntDecimal(r_precision, r_scale) SmallIntChar(r_length) BigIntVarchar(r_length) DoubleTimestamp

4 © 2014 Cloudera, Inc. All rights reserved. Query Generator 1. Generate QueryModel based on QueryProfile 2. ModelTranslator to translate from Model to database’s SQL dialect 3. Execute the SQL on via DbConnectors 4. Result comparison (sort if unsorted) QueryModel HiveProfile ImpalaProfile HiveTranslator PostgresTranslator “Test databases” MysqlTranslator HiveQL SQL (Postgres dialect) SQL (Postgres dialect) SQL (Mysql dialect) SQL (Mysql dialect) “Reference databases”

5 © 2014 Cloudera, Inc. All rights reserved. Query Model, High Level Query Clause Constant/ColFuncsTableExpr Represent valid SQL query Query consist of one or more clause (from, select, group-by, union) Clause has one or more expressions (constants, columns, functions of columns, tables), different for different clause types Model is Recursive in nature: Funcs can be run on output of other funcs Union clause can contain another query Some boolean funcs can contain subquery

6 © 2014 Cloudera, Inc. All rights reserved. Query Model, Funcs Func types: Boolean funcs (isnull, and, or, in, =, !=, >, <) Subquery funcs (exists, not exists, in, not in): May contain another Query Val funcs (Trim, Length, Concat, Add, Abs, Floor, Ceil, Greatest, Least, etc) Agg funcs (Eg, Max, Min, Sum, Avg, Count) Analytic Funcs (Rank, DenseRank, RowNumber, Lead, Lag, FirstValue, LastValue, Max, Min, etc..) Window specification (“Rows between x and y”, “rows unbounded preceding”, etc) PartitionByClause (“over (partition by x)”) OrderByClause Rules to determine where to use a func, based on func type and return type

7 © 2014 Cloudera, Inc. All rights reserved. QueryModel: Clauses QueryModel WithClause SelectClause FromClause: Table Expression WhereClause: Predicate (Boolean expr) GroupByClause: if Select (Basic or AggFunc) HavingClause: if Select (AggFunc) Predicate (Boolean expr) UnionClause (Query) OrderByClause LimitClause SelectClause, List of Expr’s: Constant Col Val Funcs AggFunc AnalyticFunc Window PartitionByClause OrderByClause WithClause: Adds a table expression: “With bar as (select * from foo) select * from bar; GroupByClause, List of: Constant Col OrderByClause, List of: Constant Col Func

8 © 2014 Cloudera, Inc. All rights reserved. QueryModel: Joins QueryModel WithClause SelectClause FromClause: Multiple table expressions JoinClause (define table relationship) WhereClause: Predicate (Boolean function, using expr from tables in JoinClause) GroupByClause HavingClause JoinClause Types: Inner Left Right Left semi Right semi Right anti Full outer Cross

9 © 2014 Cloudera, Inc. All rights reserved. Demo

10 © 2014 Cloudera, Inc. All rights reserved. Results 1: HiveQL Discrepancies Language Deficiences (as of Hive 1.1) Support “Interval” for date arithemetic operations: date + INTERVAL expr unit With {…} cannot be used in subquery Having must have a group by Cannot sort by two expressions in window function, unless window specified Negative lag or lead amount not allowed Only “Union all” and not “Union” (since fixed) Null Ordering Hive lacks specifying null order (opposite of Postgres)

11 © 2014 Cloudera, Inc. All rights reserved. Results 2: JIRA’s so far Many valid issues found, fixed since 1.1 HIVE : Null comparison for greatest and least operator HIVE : Relax type restrictions on ‘Greatest’ and ‘Least’ HIVE-11737: IndexOutOfBounds compiling query with duplicated groupby keys HIVE-11712: Duplicate groupby keys cause ClassCastException HIVE-11835: Type decimal(1,1) reads 0.0, 0.00, etc from text file as NULL HIVE : ClassCastException when selecting constant in inner select (pending)

12 © 2014 Cloudera, Inc. All rights reserved. Going Forward Tackle non-SQL-92 query-support Nested Types Partitioned tables Multi-insert

Thank you.