Download presentation
Presentation is loading. Please wait.
Published byHortense Morton Modified over 9 years ago
1
Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne
2
Drinking from a data firehose Fast and high quality data analysis for smart business decisions Data warehouses 1/3 of the database market ($$$) Column-stores are here to stay! Need for multiple concurrent users 100s to 1000s queries * 2 Many concurrent queries + column-stores = ??? *"High-performance data warehousing", TDWI best practices report
3
Multiple concurrent queries 3 DBMS CORE 4 CORE 1 CORE 3 CORE 2 CORE 8 CORE 7 CORE 6 CORE 5 MEM CORE 4 CORE 1 CORE 3 CORE 2 CORE 8 CORE 7 CORE 6 CORE 5 HDD Find all restaurants with rating over 3.5 and close to East Village steak? pasta? indian? vegan? High contention for resources
4
4 throughputresponse time
5
Throughput (memory-resident workload) 5 total #HW contexts saturation point Concurrency can hurt performance TPCH (sf:30)
6
Experimental setup Column stores System-A and System-B (Commercial) System-C (Open-source) Hardware Dual socket Intel(R) Xeon(R) CPU E5-2660 2 sockets x 8 cores x 2 threads (32 HW contexts) 128 GB RAM, 1600 MHz DIMMs L1: 64KB and L2: 256KB (per core), L3: 20MB (shared) 6
7
Workloads TPC-H Scale factor: 30 (32GB on disk) Q tpch = {10 query templates} SSB (Star Schema Benchmark) Scale factor: 30 (18GB on disk) Q ssb = {all of 13 query templates} Throughput exp. with 25 query instances 7 Memory- resident Hot-runs
8
8 Experiment 1: How does increased concurrency affect response time?
9
Scaling up TPCH Q1 9 Linear increase in response time
10
Scaling up SSB Q3.1 10 Similar behavior in SSB
11
11 Experiment 2: What is the variability of query response time?
12
Variability of System-A 12 Groups of short, medium and long running queries TPCH (64 clients)
13
Variability of System-B 13 Balanced resource allocation lower variation TPCH (64 clients)
14
Variability of System-C 14 System-C uses an admission control mechanism TPCH (64 clients)
15
15 Experiment 3: How does increasing concurrency affect throughput?
16
Throughput - TPCH 16 Throughput decreases after the saturation point 48% 32% drop 35% drop
17
Throughput - SSB 17 Exploiting sharing sustain peak performance throughput plateaus
18
When concurrency in column-stores is increased: Response time increases linearly … with high variability After saturation peak performance is not sustained 18 Except from System-B for SSB
19
Where do we go from here? QPipe, Datapath, CJoin, ShareDB, Blink Recycler (MonetDB), cooperative scans, CCM (cracking) 19 saturation point Adaptive resource (re)allocation Work sharing techniques Contention-aware scheduling Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.