(A Research Proposal for Optimizing DBMS on CMP)

Slides:



Advertisements
Similar presentations
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Advertisements

11 MCC-DB : Minimizing Cache Conflicts in Multi- core Processors for Databases Rubao Lee 1,2 Xiaoning Ding 2 Feng Chen 2 Qingda Lu 3 Xiaodong Zhang 2.
Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities Author: Bingsheng He (Nanyang Technological University, Singapore)
03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
Cache effective mergesort and quicksort Nir Zepkowitz Based on: “Improving Memory Performance of Sorting Algorithms” by Li Xiao, Xiaodong Zhang, Stefan.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Computer System Architectures Computer System Software
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Relational Query Processing on OpenCL-based FPGAs Zeke Wang, Johns Paul, Hui Yan Cheah (NTU, Singapore), Bingsheng He (NUS, Singapore), Wei Zhang (HKUST,
Table General Guidelines for Better System Performance
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Massive Spatial Query on the Kepler Architecture
Pathology Spatial Analysis February 2017
Parallel Databases.
Database Management System
Java 9: The Quest for Very Large Heaps
Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture: Large Caches, Virtual Memory
Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. --
18742 Parallel Computer Architecture Caching in Multi-core Systems
Parallel Data Laboratory, Carnegie Mellon University
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
Evaluation of Relational Operations
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
The Yin and Yang of Processing Data Warehousing Queries on GPUs
Predictive Performance
Lecture#12: External Sorting (R&G, Ch13)
Physical Database Design
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
So far… Text RO …. printf() RW link printf Linking, loading
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Selected Topics: External Sorting, Join Algorithms, …
Tapestry: Reducing Interference on Manycore Processors for IaaS Clouds
Table General Guidelines for Better System Performance
CARP: Compression-Aware Replacement Policies
Slides adapted from Donghui Zhang, UC Riverside
Lecture 13: Query Execution
Chapter 11 Database Performance Tuning and Query Optimization
Overview of Query Evaluation: JOINS
A Framework for Testing Query Transformation Rules
Database System Architectures
Parallel DBMS DBMS Textbook Chapter 22
Virtual Memory 1 1.
Presentation transcript:

Cache-Aware Query Optimization, Scheduling, and Partitioning for Multicore (A Research Proposal for Optimizing DBMS on CMP) (Focusing on Decision Support Queries) Rubao Lee

Moore’s Law in 37 Years (IEEE Spectrum, May 2008) However, all proposed schemes were evaluated by simulation, there are several limitations in simulation-based study: First, the simulation time is extremely long, for example, it would take several weeks or months to complete a single SPEC CPU2006 application. As the number of cores continues to increase, simulation ability becomes more limited. Second, by simulation, it is hard to observe long-term OS activities The interactions between process/OS may affect performance significantly. Without whole program execution and without OS iterations, how much do we trust the accuracy? Last, Simulation is prone to simulation bugs and it is impossible to model many dynamics and details of the system. 2 2

The Development of Relational DBMS Now, what should we do when DBMS meets CMP? Architecture-optimized DBMS (MonetDB): Database Architecture Optimized for the New Bottleneck: memory Access (VLDB’99) 1977 – 1997: Parallel DBMS DIRECT, Gamma, Paradise UW-MADISON 1976: DBMS: System R and Ingres (IBM, UC Berkeley) 3 1970: Relational data model (E. F. Codd) 3

An Overview of Multicore System Shared Last Level Cache Main Memory Disk

Current Efforts Reducing IO overhead by exploiting column store and data compression C-Store: A Column Oriented DBMS (VLDB’05) Increasing parallelism with new algorithms and architectures to exploit multi-cores QPipe: Operator-level parallelism (SIGMOD ’05) Exploiting data partitioning to create concurrent subtasks (Adaptive Aggregation on CMP, VLDB’07, DaMon’08) Exploiting shared cache to reduce off-chip accesses Coordinating concurrent main-memory scans (VLDB’08) Scheduling threads for sharing (SPAA’07, EuroSys’07)

Cache Contention Problem Intuitively, concurrent queries will access common dataset, So, there is a good opportunity for multi-cores to exploit constructive data sharing in the last-level cache. However, that is not all. Private data structures created during query execution will cause cache contention in LLC. For example: Private hash table during hash join Private hash table during hash aggregation Private sorted result during sort-merge join

Consider: select * from ta, tb where ta.x = tb.y A Case of Hash Join Consider: select * from ta, tb where ta.x = tb.y Query1 Core 1 Query2 Core 2 HJ HJ Conflict! LLC H H tb Benefit! ta Mem H H ta tb Shared cache is two-fold constructive data sharing and cache contention

Cache Contention During concurrent query executions, cache contention in LLC will cause the following issues: A: Suboptimal Query Plan (Unbalanced Resource usage) B: Incorrect Scheduling Policy (Unnecessary cache conflict) C: Wasted Cache Space (Ineffective cache allocation)

A: Suboptimal Query Plan Query optimizer selects the best plan for a query. Query optimizer is not shared cache aware. A best plan may be suboptimal when concurrent such plans are co-running. Query optimizer doesn’t consider potential cache contention and performance degradation on multicores.

A: Suboptimal Query Plan (cont.) Consider: select * from ta, tb where ta.x = tb.y Cache Sensitive! H ta tb HJ Build a hashtable on ta probe it by tb’s each tuple ta tb IJ tb index For ta’s each tuple lookup tb via index Performance Degradation Reducing allocated cache

B: Incorrect Scheduling Policy (1) Default scheduling is not shared cache aware. (2) We need to co-schedule queries for minimizing cache contentions. Two hashjoins and two table scans on dual cores. Worst scheduling: co-schedule hashjoins (conflict!) Smart scheduling: co-schedule hashjoin and tablescan 30% Improvement!

C: Wasted Cache Space (1) Different queries have different cache utilization. (2) Cache allocation is demand-based by default. But, we should allocate more cache to queries which can best benefit from large cache size. When co-schedule hashjoin and tablescan, allocate more cache to hashjoin! ( ts will pollute cache) 16% Improvement!

Shared Last Level Cache Our Objectives Queries Query Optimizer (1) Query Optimization: To let query optimizer generate optimal query plans based on usage of shared cache (2) Query Scheduling: To pair queries for co-running on multi-cores to minimize cache contentions in shared cache (3) Cache Partitioning: To allocate cache space to maximize cache utilization for co-scheduled queries Query Scheduler Core Shared Last Level Cache Cache Partitioning

Challenges and Opportunities Challenges to design shared cache optimized query execution engine on multicore platforms DBMS is running on user space and is not able to directly control cache allocation in CPU Scheduling Policy: Predict potential cache conflicts among co-running queries Partitioning Policy: Predict cache utilization for different query operations (join, scan , aggregation, sorting,…) Opportunities: Query optimizer can provide hints of data access patterns and estimate working set size during query executions Operation system can manage cache allocation by using page coloring during virtual-physical address mapping.

Summary Optimizing DBMS on CMP is an important research issue. No prior work considers how to reduce cache contention for concurrent query executions. We propose a solution framework containing query optimization, scheduling and cache partitioning. Our solution combines the application knowledge by DBMS and the cache control ability by OS.