TPC-H Studies Joe Chang

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Understanding SQL Server Query Execution Plans
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
new database engine component fully integrated into SQL Server 2014 optimized for OLTP workloads accessing memory resident data achive improvements.
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Performance of Cache Memory
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Project Management Database and SQL Server Katmai New Features Qingsong Yao
C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.
1 Overview of Storage and Indexing Chapter 8 (part 1)
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Alternative: Bitmap Indexing Imagine the following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house.
SQL Server Query Optimizer Cost Formulas Joe Chang
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
Parallel Execution Plans Joe Chang
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Insert, Update & Delete Performance Joe Chang
1 Overview of Storage and Indexing Chapter 8 (part 1)
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Reports. Report Summary Warehouse Reports Returned Material Serial Numbers Not Found This report list the serial numbers of material returned which were.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
SQL Server Scaling on Big Iron (NUMA) Systems Joe Chang TPC-H.
Query Optimizer Execution Plan Cost Model Joe Chang
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 4 Logical & Physical Database Design
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Chapter 5 Index and Clustering
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
OM. Platinum Level Sponsors Gold Level Sponsors Pre Conference Sponsor Venue Sponsor Key Note Sponsor.
APRIL 13 th Introduction About me Duško Mirković 7 years of experience.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SQL Server Statistics and its relationship with Query Optimizer
Lecture 2: Performance Evaluation
Query Tuning without Production Data
Scaling SQL with different approaches
Query Tuning without Production Data
Query Tuning without Production Data
System Architecture: Big Iron (NUMA)
Power BI Performance …Tips and Techniques.
Steve Hood SimpleSQLServer.com
Troubleshooting Techniques(*)
SQL Server Query Optimizer Cost Formulas
Four Rules For Columnstore Query Performance
Diving into Query Execution Plans
Lu Tang , Qun Huang, Patrick P. C. Lee
Performance Tuning ETL Process
Presentation transcript:

TPC-H Studies Joe Chang

About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation

TPC-H

TPC-H DSS – 22 queries, geometric mean 60X range plan cost, comparable actual range Power – single stream Tests ability to scale parallel execution plans Throughput – multiple streams Scale Factor 1 – Line item data is 1GB 875MB with DATE instead of DATETIME Only single column indexes allowed, Ad-hoc

SF 10, test studies Not valid for publication Auto-Statistics enabled, Excludes compile time Big Queries – Line Item Scan Super Scaling – Mission Impossible Small Queries & High Parallelism Other queries, negative scaling Did not apply T2301, or disallow page locks

Big Q: Plan Cost vs Actual Plan Cost reduction from DOP1 to 16/32 Q128% Q944% Q1870% Q2120% Plan Cost says scaling is poor except for Q18, memory affects Hash IO onset Plan 10GB Actual Query time In seconds Plan Cost is poor indicator of true parallelism scaling Q18 & Q 21 > 3X Q1, Q9

Big Query: Speed Up and CPU Q13 has slightly better than perfect scaling? In general, excellent scaling to DOP 8-24, weak afterwards Holy Grail CPU time In seconds Speed up relative to DOP 1

Super Scaling Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged CPU time = 100 sec, elapse time = 100 sec What is best case for DOP 2? Assuming nearly zero Repartition Threads cost CPU time = 100 sec, elapsed time = 50? Super Scaling: CPU time decreases going from Non-Parallel to Parallel plan! No, I have not started drinking, yet

Super Scaling CPU-sec goes down from DOP 1 to 2 and higher (typically 8) CPU normalized to DOP 1 Speed up relative to DOP 1 3.5X speedup from DOP 1 to 2 (Normalized to DOP 1)

CPU and Query time in seconds CPU time Query time

Super Scaling Summary Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for Microsoft: Can I use Bitmap Filters in OLTP systems with non-parallel plans?

Small Queries – Plan Cost vs Act Query 3 and 16 have lower plan cost than Q17, but not included Q4,6,17 great scaling to DOP 4, then weak Negative scaling also occurs Query time Plan Cost

Small Queries CPU & Speedup What did I get for all that extra CPU?, Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling Query 2 negative at DOP 2, Q4 is good, Q6 get speedup, but at CPU premium, Q17 and 20 negative after DOP 8 CPU time Speed up

High Parallelism – Small Queries Why? Almost No value TPC-H geometric mean scoring Small queries have as much impact as large Linear sum of weights large queries OLTP with 32, 64+ cores Parallelism good if super-scaling Default max degree of parallelism 0 Seriously bad news, especially for small Q Increase cost threshold for parallelism? Sometimes you do get lucky

Q that go Negative Query time “Speedup”

CPU

Other Queries – CPU & Speedup Q3 has problems beyond DOP 2 CPU time Speedup

Other - Query Time seconds Query time

Scaling Summary Some queries show excellent scaling Super-scaling, better than 2X Sharp CPU jump on last DOP doubling Need strategy to cap DOP To limit negative scaling Especially for some smaller queries? Other anomalies

Compression PAGE

Compression Overhead - Overall 40% overhead for compression at low DOP, 10% overhead at max DOP??? Query time compressed relative to uncompressed CPU time compressed relative to uncompressed

Query time compressed relative to uncompressed CPU time compressed relative to uncompressed

Compressed Table LINEITEM – real data may be more compressible Uncompressed: 8,749,760KB, Average Bytes per row: 149 Compressed: 4,819,592KB, Average Bytes per row: 82

Partitioning Orders and Line Item on Order Key

Partitioning Impact - Overall Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned

Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned

Plan for Partitioned Tables

Scaling DW Summary Massive IO bandwidth Parallel options for data load, updates etc Investigate Parallel Execution Plans Scaling from DOP 1, 2, 4, 8, 16, 32 etc Scaling with and w/o HT Strategy for limiting DOP with multiple users

Fixes from Microsoft Needed Contention issues in parallel execution Table scan, Nested Loops Better plan cost model for scaling Back-off on parallelism if gain is negligible Fix throughput degradation with multiple users running big DW queries Sybase and Oracle, Throughput is close to Power or better

Query Plans

Big Queries

Q1 Pricing Summary Report

Q1 Plan Non-Parallel Parallel Parallel plan 28% lower than scalar, IO is 70%, no parallel plan cost reduction

Q9 Product Type Profit Measure IO from 4 tables contribute 58% of plan cost, parallel plan is 39% lower Non-Parallel Parallel

Q9 Non-Parallel Plan Table/Index Scans comprise 64%, IO from 4 tables contribute 58% of plan cost Join sequence: Supplier, (Part, PartSupp), Line Item, Orders

Q9 Parallel Plan Non-Parallel: (Supplier), (Part, PartSupp), Line Item, Orders Parallel: Nation, Supplier, (Part, Line Item), Orders, PartSupp

Q9 Non-Parallel Plan details Table Scans comprise 64%, IO from 4 tables contribute 58% of plan cost

Q9 Parallel reg vs Partitioned

Q13 Why does Q13 have perfect scaling?

Q18 Large Volume Customer Non-Parallel Parallel

Q18 Graphical Plan Non-Parallel Plan: 66% of cost in Hash Match, reduced to 5% in Parallel Plan

Q18 Plan Details Non-Parallel Parallel Non-Parallel Plan Hash Match cost is 1245 IO, CPU DOP 16/32: size is below IO threshold, CPU reduced by >10X

Q21 Suppliers Who Kept Orders Waiting Note 3 references to Line Item Non-Parallel Parallel

Q21 Non-Parallel Plan H1 H2 H3 H2 H3

Q21 Parallel

Q21 3 full Line Item clustered index scans Plan cost is approx 3X Q1, single “scan”

Super Scaling

Q7 Volume Shipping Non-Parallel Parallel

Q7 Non-Parallel Plan Join sequence: Nation, Customer, Orders, Line Item

Q7 Parallel Plan Join sequence: Nation, Customer, Orders, Line Item

Q8 National Market Share Non-Parallel Parallel

Q8 Non-Parallel Plan Join sequence: Part, Line Item, Orders, Customer

Q8 Parallel Plan Q8 Parallel Plan Join sequence: Part, Line Item, Orders, Customer

Q11 Important Stock Identification Non-Parallel Parallel

Q11 Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp

Q11

Small Queries

Query 2 Minimum Cost Supplier Wordy, but only touches the small tables, second lowest plan cost (Q15)

Q2 Clustered Index Scan on Part and PartSupp have highest cost (48%+42%)

Q2 PartSupp is now Index Scan + Key Lookup

Q6 Forecasting Revenue Change Note sure why this blows CPU Scalar values are pre-computed, pre-converted

Q20? This query may get a poor execution plan Date functions are usually written as because Line Item date columns are “date” type CAST helps DOP 1 plan, but get bad plan for parallel

Q20

Q20

Q20 alternate - parallel Statistics estimation error here Penalty for mistake applied here

Other Queries

Q3

Q3

Q12 Random IO? Will this generate random IO?

Query 12 Plans Non-Parallel Parallel

Queries that go Negative

Q17 Small Quantity Order Revenue

Q17 Table Spool is concern

Q17 the usual suspects

Q19

Q19

Q22

Q22

Speedup from DOP 1 query time CPU relative to DOP 1