Download presentation
Presentation is loading. Please wait.
Published byMarquise Coiner Modified over 9 years ago
1
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University **HP LABS
2
Ryan Johnson © April 15http://www.pdl.cmu.edu/2 Motivation For Work Sharing scan join aggregate scan output StudentDept Query: What is the highest undergraduate GPA? aggregate output scan Student Query: What is the average GPA in the ECE dept.?
3
Ryan Johnson © April 15http://www.pdl.cmu.edu/3 Motivation For Work Sharing Many queries in system Similar requests Redundant work Work Sharing Detect redundant work Compute results once and share Big win for I/O, uniprocessors scan join aggregate scan output StudentDept aggregate output scan Student 2x speedup for TPC-H queries [hariz05]
4
Ryan Johnson © April 15http://www.pdl.cmu.edu/4 0.0 0.5 1.0 1.5 2.0 0153045 Shared Queries Speedup due to WS 1 CPU 7x Work Sharing on Modern Hardware 8 CPU Memory L1 L2 core L1 L2 core L2 Memory Work sharing can hurt performance!
5
Ryan Johnson © April 15http://www.pdl.cmu.edu/5 Contributions Observation Work sharing can hurt performance on parallel hardware Analysis Develop intuitive analytical model of work sharing Identify trade-off between total work, critical path Application Model-based policy outperforms static ones by up to 6x
6
Ryan Johnson © April 15http://www.pdl.cmu.edu/6 Outline Introduction Part I: Intuition and Model Part II: Analysis and Experiments
7
Ryan Johnson © April 15http://www.pdl.cmu.edu/7 Challenges of Exploiting Work Sharing Independent execution only? Load reduction from work sharing can be useful Work sharing only? Indiscriminate application can hurt performance To share or not to share? System and workload dependent Adapt decisions at runtime Must understand work sharing to exploit it fully
8
Ryan Johnson © April 15http://www.pdl.cmu.edu/8 Work Sharing vs. Parallelism Query 2 Independent Execution Query 1 Query 2 response timeQuery 1 response time Scan Join Aggregate P = 4.33 Critical Paths
9
Ryan Johnson © April 15http://www.pdl.cmu.edu/9 Work Sharing vs. Parallelism Query 2 Query 1 Query 2 response timeQuery 1 response time Shared Execution Penalty Scan Join Aggregate P = 2.75 P = 4.33 Critical path now longer Total work and critical path both important
10
Ryan Johnson © April 15http://www.pdl.cmu.edu/10 Understanding Work Sharing Performance depends on two factors: Work sharing presents a trade-off Reduces total work Potentially lengthens critical path Balance both factors or performance suffers
11
Ryan Johnson © April 15http://www.pdl.cmu.edu/11 Basis for a Model “Closed” system Consistent high load Throughput computing Assumed in most benchmarks Fixed number of clients Little’s Law governs throughput Higher response time = lower throughput Total work not a direct factor! Load reduction secondary to response time
12
Ryan Johnson © April 15http://www.pdl.cmu.edu/12 Predicting Response Time Case 1: Compute-bound Case 2: Critical path-bound Larger bottleneck determines response time Model provides u and p max
13
Ryan Johnson © April 15http://www.pdl.cmu.edu/13 An Analytical Model of Work Sharing Sharing helpful when X shared > X alone Throughput for m queries and n processors Potentially worsened by work sharing Improved by work sharing U = requested utilization Pmax = longest pipe stage
14
Ryan Johnson © April 15http://www.pdl.cmu.edu/14 Outline Introduction Part I: Intuition and Model Part II: Analysis and Experiments
15
Ryan Johnson © April 15http://www.pdl.cmu.edu/15 Experimental Setup Hardware Sun T2000 “Niagara” with 16 GB RAM 8 cores (32 threads) Solaris processor sets vary effective CPU count Cordoba Staged DBMS Naturally exposes work sharing Flexible work sharing policies 1GB TPCH dataset Fixed Client and CPU counts per run
16
Ryan Johnson © April 15http://www.pdl.cmu.edu/16 Predicted vs. Measured Performance 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0153045 Shared Queries Speedup due to WS 1 CPU 1 CPU model 2 CPU 2 CPU model 8 CPU 8 CPU model 32 CPU 32 CPU model Model Validation: TPCH Q1 Avg/max error: 5.7% / 22%
17
Ryan Johnson © April 15http://www.pdl.cmu.edu/17 Model Validation: TPCH Q4 Behavior varies with both system and workload
18
Ryan Johnson © April 15http://www.pdl.cmu.edu/18 Exploring WS vs. Parallelism Work sharing splits query into three parts Independent work –Per-query, parallel –Total work Serial work –Per-query, serial –Critical path Shared work –Computed once –“Free” after first query Serial - 4% Independent - 37% Shared - 59% Example:
19
Ryan Johnson © April 15http://www.pdl.cmu.edu/19 0 0.5 1 1.5 2 2.5 08162432 Shared Queries Benefit from Work Sharing 4 8 16 32 Exploring WS vs. Parallelism Behavior matches previously published results Potential Speedup CPUs
20
Ryan Johnson © April 15http://www.pdl.cmu.edu/20 0 0.5 1 1.5 2 2.5 08162432 Shared Queries Benefit from Work Sharing 4 8 Exploring WS vs. Parallelism Potential Speedup CPUs Saturated
21
Ryan Johnson © April 15http://www.pdl.cmu.edu/21 0 0.5 1 1.5 2 2.5 08162432 Shared Queries Benefit from Work Sharing 4 8 16 32 Exploring WS vs. Parallelism Potential Speedup CPUs Saturated
22
Ryan Johnson © April 15http://www.pdl.cmu.edu/22 0 0.5 1 1.5 2 2.5 08162432 Shared Queries Benefit from Work Sharing 4 8 16 32 Exploring WS vs. Parallelism More processors shift bottleneck to critical path Saturated Potential Speedup CPUs
23
Ryan Johnson © April 15http://www.pdl.cmu.edu/23 Performance Impact of Serial Work Critical path quickly becomes major bottleneck Impact of Serial Work (32 CPU) 0 0.5 1 1.5 2 2.5 010203040 Shared Queries Benefit from Work Sharing 0% 1% 2% 7%
24
Ryan Johnson © April 15http://www.pdl.cmu.edu/24 Model-guided Work Sharing Integrate predictive model into Cordoba Predict benefit of work sharing for each new query Consider multiple groups of queries at once Shorter critical path, increased parallelism Experimental setup Profile run with 2 clients, 2 CPUs Extract model parameters with profiling tools 20 clients submit mix of TPCH Q1 and Q4 Compare against always-, never-share policies
25
Ryan Johnson © April 15http://www.pdl.cmu.edu/25 Comparison of Work Sharing Strategies Model-based policy balances critical path and load 2 CPU 0 50 100 150 200 All Q1 50/50All Q4 Query Ratio Queries/min always share model guided never share 32 CPU 0 50 100 150 200 250 Queries/min All Q1 50/50All Q4 Query Ratio
26
Ryan Johnson © April 15http://www.pdl.cmu.edu/26 Related Work Many existing work sharing schemes Identification occurs at different stages in the query’s lifetime All allow pipelined query execution Materialized Views [rouss82] Multiple Query Optimization [roy00] Staged DBMS [hariz05] Synchronized Scanning [lang07] Schema design Buffer Pool Access Query compilation Query execution Early Late Model describes all types of work sharing
27
Ryan Johnson © April 15http://www.pdl.cmu.edu/27 Conclusions Work sharing can hurt performance Highly parallel, memory resident machines Intuitive analytical model captures behavior Trade-off between load reduction and critical path Model-guided work sharing highly effective Outperforms static policies by up to 6x http://www.cs.cmu.edu/~StagedDB/
28
Ryan Johnson © April 15http://www.pdl.cmu.edu/28 References [hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005. [lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007. [rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982. [roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000. http://www.cs.cmu.edu/~StagedDB/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.