Scaling SQL with different approaches

Slides:



Advertisements
Similar presentations
Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.
Advertisements

1.
1Jonathan Lewis EOUG Jun 2000 Execution Plans Explain Plan - part 2 Parallel - Partitions - Problems.
What Happens when a SQL statement is issued?
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Evaluation of distributed open source solutions in CERN database use cases HEPiX, spring 2015 Kacper Surdy IT-DB-DBF M. Grzybek, D. L. Garcia, Z. Baranowski,
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Hadoop and HDFS
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Oracle Index study for Event TAG DB M. Boschini S. Della Torre
One Billion Objects in 2GB: Big Data Analytics on Small Clusters with Doradus OLAP There are many good software modules available today that provide big.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
The Model Clause explained Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
Parallel Execution Plans Joe Chang
TPC-H Studies Joe Chang
Oracle Database Performance Secrets Finally Revealed Greg Rahn & Michael Hallas Oracle Real-World Performance Group Server Technologies.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Matthew Winter and Ned Shawa
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
David Konopnicki –1997, Rev. MS Optimizing Join Statements To choose an execution plan for a join statement, the optimizer must choose: ä Access.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SAP Tuning 실무 SK㈜ ERP TFT.
Matrix Multiplication in Hadoop
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
Hadoop file format studies in IT-DB Analytics WG meeting 20 th of May, 2015 Daniel Lanza, IT-DB.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Image taken from: slideshare
Performance Assurance for Large Scale Big Data Systems
SQL Server Statistics and its relationship with Query Optimizer
Big Data is a Big Deal!.
PROTECT | OPTIMIZE | TRANSFORM
Integration of Oracle and Hadoop: hybrid databases affordable at scale
About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.
Hadoop and Analytics at CERN IT
Real-time analytics using Kudu at petabyte scale
Design Patterns for SSIS Performance
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Running virtualized Hadoop, does it make sense?
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Query Tuning without Production Data
Spark Presentation.
Query Tuning without Production Data
Choosing Access Path The basic methods.
Powering real-time analytics on Xfinity using Kudu
Lab #2 - Create a movies dataset
Rekha Singhal, Amol Khanapurkar, TCS Mumbai.
Introduction to Spark.
Lecture 17: Distributed Transactions
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
On Spatial Joins in MapReduce
Cse 344 May 2nd – Map/reduce.
Lesson 1 – Chapter 1B Chapter 1B – Terminology
SETL: Efficient Spark ETL on Hadoop
CS110: Discussion about Spark
Introduction to Apache
Managing batch processing Transient Azure SQL Warehouse Resource
Parallel Analytic Systems
Overview of big data tools
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Diving into Query Execution Plans
SDMX meeting Big Data technologies
Pig Hive HBase Zookeeper
Presentation transcript:

Scaling SQL with different approaches Credits: Luca Canali

2nd attempt with Oracle: it scales-up! Initial query is bound by CPU – big table join 39M x 10M ===> 374M By default it is performed on a single core This can be significantly speeded up by using more cores – parallel query IMPORTANT: by default parallel queries are disabled on Oracle production clusters

Parallel query on 60-core machine Creating view with timestamps of interests with fundamentals as (select /*+ materialize parallel(120) */ distinct service_id, valid_from dates from csdb.csdb) select /*+ parallel(120) */ a.service_id, max(a.devs_per_serv_and_fdate) from ( select f.service_id, count(1) devs_per_serv_and_fdate from csdb.csdb c, fundamentals f where f.service_id=c.service_id and f.dates between c.valid_from and c.valid_to group by f.service_id, f.dates ) a group by a.service_id order by a.service_id; Elapsed: 00:15:21.13 Efficiency: 374M rows on 120 cores = 3.3k rows /s /core Joining view with fact table

Execution plan Creating temp table with timestamps PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| TQ |IN-OUT| PQ Distrib | | 0 | SELECT STATEMENT | | 1 | 26 | 8 (13)| | | | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10001 | 1 | 22 | 5 (20)| Q1,01 | P->S | QC (RAND) | | 4 | LOAD AS SELECT (TEMP SEGMENT MERGE) | SYS_TEMP_0FD9D6F7C_D780440F | | | | Q1,01 | PCWP | | | 5 | HASH UNIQUE | | 1 | 22 | 5 (20)| Q1,01 | PCWP | | | 6 | PX RECEIVE | | 1 | 22 | 5 (20)| Q1,01 | PCWP | | | 7 | PX SEND HASH | :TQ10000 | 1 | 22 | 5 (20)| Q1,00 | P->P | HASH | | 8 | HASH UNIQUE | | 1 | 22 | 5 (20)| Q1,00 | PCWP | | | 9 | PX BLOCK ITERATOR | | 1 | 22 | 4 (0)| Q1,00 | PCWC | | | 10 | TABLE ACCESS FULL | F2P_IP_INFO_EXT | 1 | 22 | 4 (0)| Q1,00 | PCWP | | | 11 | PX COORDINATOR | | | | | | | | | 12 | PX SEND QC (ORDER) | :TQ20002 | 1 | 26 | 4 (0)| Q2,02 | P->S | QC (ORDER) | | 13 | SORT GROUP BY | | 1 | 26 | 4 (0)| Q2,02 | PCWP | | | 14 | PX RECEIVE | | 1 | 26 | 4 (0)| Q2,02 | PCWP | | | 15 | PX SEND RANGE | :TQ20001 | 1 | 26 | 4 (0)| Q2,01 | P->P | RANGE | | 16 | HASH GROUP BY | | 1 | 26 | 4 (0)| Q2,01 | PCWP | | | 17 | VIEW | | 1 | 26 | 4 (0)| Q2,01 | PCWP | | | 18 | HASH GROUP BY | | 1 | 53 | 4 (0)| Q2,01 | PCWP | | | 19 | PX RECEIVE | | 1 | 53 | 4 (0)| Q2,01 | PCWP | | | 20 | PX SEND HASH | :TQ20000 | 1 | 53 | 4 (0)| Q2,00 | P->P | HASH | | 21 | HASH GROUP BY | | 1 | 53 | 4 (0)| Q2,00 | PCWP | | | 22 | NESTED LOOPS | | 1 | 53 | 4 (0)| Q2,00 | PCWP | | | 23 | NESTED LOOPS | | 1 | 53 | 4 (0)| Q2,00 | PCWP | | | 24 | VIEW | | 1 | 22 | 4 (0)| Q2,00 | PCWP | | | 25 | PX BLOCK ITERATOR | | 1 | 22 | 4 (0)| Q2,00 | PCWC | | | 26 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6F7C_D780440F | 1 | 22 | 4 (0)| Q2,00 | PCWP | | |* 27 | INDEX RANGE SCAN | F2P_IP_INFO_EXT_SERVICE_ID_IDX | 1 | | 0 (0)| Q2,00 | PCWP | | |* 28 | TABLE ACCESS BY INDEX ROWID| F2P_IP_INFO_EXT | 1 | 31 | 0 (0)| Q2,00 | PCWP | | Creating temp table with timestamps For each sub query take part of temp table and join with fact table (using index)

Need more resources? Use Hadoop! It is not only because of horizontal scalability…

Typical options available for data ingestion to Hadoop Apache Sqoop Using JDBC – many dbs supported Can write in avro or parquet formats directly Kite From JSON or CSV Custom scripts

Options for querying the data on Hadoop? Querying the data with SQL (declarative) Apache Hive Apache Impala Apache SparkSQL

Approach with Impala Create a Hive table create external table csdb like parquetfile 'hdfspath' stored as parquet location 'hdfsdir' Query it (SQL statement is unchanged!) with fundamentals as (select distinct valid_from dates,service_id from csdb) select a.service_id,max(c) from ( select c.service_id,f.dates,count(1) c from csdb c,fundamentals f where f.service_id=c.service_id and f.dates between c.valid_from and c.valid_to group by service_id,f.dates ) a group by service_id order by 1

Impala’s runtime execution plan Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 14:MERGING-EXCHANGE 0 0.000ns 0.000ns 0 -1 0 -1.00 B UNPARTITIONED 06:SORT 4 6.346ms 8.682ms 0 -1 24.00 MB 0 13:AGGREGATE 4 151.256ms 158.305ms 0 -1 2.25 MB 128.00 MB FINALIZE 12:EXCHANGE 4 0.000ns 0.000ns 0 -1 0 0 HASH(a.service_id) 05:AGGREGATE 4 160.179ms 166.378ms 0 -1 1.25 MB 128.00 MB STREAMING 11:AGGREGATE 4 2.100ms 2.293ms 0 -1 2.25 MB 128.00 MB FINALIZE 10:EXCHANGE 4 0.000ns 0.000ns 0 -1 0 0 HASH(c.service_id,f.dates) 04:AGGREGATE 4 0.000ns 0.000ns 0 -1 155.05 MB 128.00 MB STREAMING 03:HASH JOIN 4 9m30s 9m32s 35.22M -1 932.16 MB 2.00 GB INNER JOIN, BROADCAST |--09:EXCHANGE 4 198.163ms 203.434ms 10.21M -1 0 0 BROADCAST | 08:AGGREGATE 4 584.498ms 636.290ms 10.21M -1 204.04 MB 128.00 MB FINALIZE | 07:EXCHANGE 4 59.790ms 72.221ms 10.21M -1 0 0 HASH(valid_from,service_id) | 02:AGGREGATE 4 1s203ms 1s792ms 10.21M -1 395.15 MB 128.00 MB STREAMING | 01:SCAN HDFS 4 251.675ms 344.008ms 39.22M -1 58.03 MB 96.00 MB default.csdb 00:SCAN HDFS 4 93.435ms 110.983ms 5.75M -1 86.07 MB 144.00 MB default.csdb c Took: 103m41s Only 4 machines used in processing – because input file is small -> only 4 blocks

2nd approach - making Impala running on all nodes The number of workers involved is data driven Table has to be repartitioned Number of block >= the number of machines Data should be evenly distributed between blocks/partitions (hash partitioning) Scaled-out on 12 machines took 39m Only one core per machnie was used by Impala Efficiency: 374M rows produced on 12 cores = 13k rows /s /core Compute table stats to scale-up it up (use more resource per host) compute stats csdb Estimated Per-Host Requirements: Memory=2.03GB VCores=3 With table statistics it took 17m

How about SparkSQL? The same query can run on SparkSQL (1.6) like Impala it requires to have nr of blocks >= number of machines pyspark --driver-memory 2g --num-executors 12 import time starttime=time.time() sqlContext.sql("with fundamentals as (select distinct valid_from dates,service_id from luca_test.csdb2)select a.service_id,max(a.c) from (select c.service_id,f.dates,count(1) c from luca_test.csdb2 c,fundamentals f where f.service_id=c.service_id and f.dates between c.valid_from and c.valid_to group by c.service_id,f.dates) a group by a.service_id order by 1").show() print("Delta time = %f" % (time.time()-starttime)) Delta time = 1342.786748 Took 22 minutes on 12 machines 1 core used per machine Efficiency: 23k rows /s /core Scaling-up --executor-memory 2g --executor-cores 3 Took 13 minutes

SparkSQL on Spark 2.0 Spark 2.0 introduced at end of July 2016 The same query took just 2 minutes! Efficiency: 245k row /s /core Why Spark2.0 is so fast? Spark2.0 introduces a lot of optimizations including Code Generation (already available in Impala) "Vectorization" of operations on rows (for details see https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html)

About CodeGeneration Query execution to bytecode compilation during runtime Source: databricks.com

About "Vectorization" in Spark 2.0 Batching multiple rows together and apply operators vertically (on columns) Example: parquet reader Source: databricks.com

Sparks benchmarks by databricks Source: databricks.com

Execution plan: Spark 1.6 vs Spark 2.0

Profiling a Spark 1.6 worker

Profiling a Spark 2.0 worker

Conclusions Even small data problems (200MB) can struggle big data platforms ;) right data distribution within blocks and machines is a key to obtain scalability Oracle can run quite fast, but is less efficient than Impala or Spark due to rows processing model it lacks CodeGen CodeGen has speeds up CPU bound workloads significantly Especially when operating on simple scalar types Seems that Spark 2.0 outperforms all other players Looking forward to have it on the clusters More detailed info: http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs