Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne.

Similar presentations


Presentation on theme: "Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne."— Presentation transcript:

1 Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne

2 Drinking from a data firehose  Fast and high quality data analysis for smart business decisions  Data warehouses  1/3 of the database market ($$$)  Column-stores are here to stay!  Need for multiple concurrent users  100s to 1000s queries * 2 Many concurrent queries + column-stores = ??? *"High-performance data warehousing", TDWI best practices report

3 Multiple concurrent queries 3 DBMS CORE 4 CORE 1 CORE 3 CORE 2 CORE 8 CORE 7 CORE 6 CORE 5 MEM CORE 4 CORE 1 CORE 3 CORE 2 CORE 8 CORE 7 CORE 6 CORE 5 HDD Find all restaurants with rating over 3.5 and close to East Village steak? pasta? indian? vegan? High contention for resources

4 4 throughputresponse time

5 Throughput (memory-resident workload) 5 total #HW contexts saturation point Concurrency can hurt performance TPCH (sf:30)

6 Experimental setup  Column stores  System-A and System-B (Commercial)  System-C (Open-source)  Hardware  Dual socket Intel(R) Xeon(R) CPU E5-2660 2 sockets x 8 cores x 2 threads (32 HW contexts)  128 GB RAM, 1600 MHz DIMMs  L1: 64KB and L2: 256KB (per core), L3: 20MB (shared) 6

7 Workloads  TPC-H  Scale factor: 30 (32GB on disk)  Q tpch = {10 query templates}  SSB (Star Schema Benchmark)  Scale factor: 30 (18GB on disk)  Q ssb = {all of 13 query templates}  Throughput exp. with 25 query instances 7 Memory- resident Hot-runs

8 8 Experiment 1: How does increased concurrency affect response time?

9 Scaling up TPCH Q1 9 Linear increase in response time

10 Scaling up SSB Q3.1 10 Similar behavior in SSB

11 11 Experiment 2: What is the variability of query response time?

12 Variability of System-A 12 Groups of short, medium and long running queries TPCH (64 clients)

13 Variability of System-B 13 Balanced resource allocation  lower variation TPCH (64 clients)

14 Variability of System-C 14 System-C uses an admission control mechanism TPCH (64 clients)

15 15 Experiment 3: How does increasing concurrency affect throughput?

16 Throughput - TPCH 16 Throughput decreases after the saturation point 48% 32% drop 35% drop

17 Throughput - SSB 17 Exploiting sharing  sustain peak performance throughput plateaus

18 When concurrency in column-stores is increased:  Response time increases linearly  … with high variability  After saturation peak performance is not sustained 18 Except from System-B for SSB

19 Where do we go from here?  QPipe, Datapath, CJoin, ShareDB, Blink  Recycler (MonetDB), cooperative scans, CCM (cracking) 19 saturation point  Adaptive resource (re)allocation  Work sharing techniques  Contention-aware scheduling Thank you!


Download ppt "Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne."

Similar presentations


Ads by Google