1 Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.

Slides:



Advertisements
Similar presentations
DBMS S O N A M ODERN P ROCESSOR : W HERE D OES T IME G O ? Anatassia Ailamaki David J DeWitt Mark D. Hill David A. Wood Presentation by Monica Eboli.
Advertisements

From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale.
1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.
Critical Sections: Re-emerging Concerns for DBMS Ryan JohnsonIppokratis Pandis Anastasia Ailamaki Carnegie Mellon University École Polytechnique Féderale.
@ Carnegie Mellon Databases Data-oriented Transaction Execution VLDB 2010 Ippokratis Pandis Ryan Johnson Nikos Hardavellas Anastasia Ailamaki Carnegie.
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Improving OLTP scalability using speculative lock inheritance Ryan Johnson, Ippokratis Pandis, Anastasia Ailamaki.
OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,
U NIVERSITY U NIVERSITY OF T ORONTO U NIVERSITY OF T ORONTO Bionic databases are coming. What will they look like? *Ryan Johnson & Ippokratis Pandis**
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Hikmet Aras
Lecture 6: Multicore Systems
An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors Matthew Curtis-Maury, Xiaoning Ding, Christos D. Antonopoulos, and Dimitrios.
Nikos Hardavellas, Northwestern University
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Teaching Old Caches New Tricks: Predictor Virtualization Andreas Moshovos Univ. of Toronto Ioana Burcea’s Thesis work Some parts joint with Stephen Somogyi.
Improving Database Performance on Simultaneous Multithreading Processors Jingren Zhou Microsoft Research John Cieslewicz Columbia.
Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
What will my performance be? Resource Advisor for DB admins Dushyanth Narayanan, Paul Barham Microsoft Research, Cambridge Eno Thereska, Anastassia Ailamaki.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Meanwhile RAM cost continues to drop Moore’s Law on total CPU processing power holds but in parallel processing… CPU clock rate stalled… Because.
A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.
DB system design for new hardware and sciences Anastasia Ailamaki École Polytechnique Fédérale de Lausanne and Carnegie Mellon University.
Exploiting Dark Silicon for Energy Efficiency Nikos Hardavellas Northwestern University, EECS.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
Multi Core Processor Submitted by: Lizolen Pradhan
Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.
Pınar Tözün Anastasia Ailamaki SLICC Self-Assembly of Instruction Cache Collectives for OLTP Workloads Islam Atta Andreas Moshovos.
Multi-core architectures. Single-core computer Single-core CPU chip.
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Storage Manager Scalability on CMPs Ippokratis Pandis CIDR Gong Show.
Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*
CSC 7080 Graduate Computer Architecture Lec 8 – Multiprocessors & Thread- Level Parallelism (3) – Sun T1 Dr. Khalaf Notes adapted from: David Patterson.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Itanium® 2 Processor Architecture
Computer Sciences Department University of Wisconsin-Madison
COMP 740: Computer Architecture and Implementation
Reducing OLTP Instruction Misses with Thread Migration
ESE532: System-on-a-Chip Architecture
Multi-core processors
Scaling the Memory Power Wall with DRAM-Aware Data Management
Memory System Characterization of Commercial Workloads
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Short Circuiting Memory Traffic in Handheld Platforms
Temporal Streaming of Shared Memory
What is Parallel and Distributed computing?
Hardware Multithreading
Adaptive Single-Chip Multiprocessing
Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
Simulating a $2M Commercial Server on a $2K PC
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Chapter 4 Multiprocessors
Presentation transcript:

1 Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia Ailamaki, Babak Falsafi

© Hardavellas 2 Hardware Integration Trends Moore’s Law: 2x trans. = 2x cores, 2x caches Trends to use larger but slower caches traditional multiprocessors L1 L2 core Memory L1 L2 core chip multiprocessors L1 L2 core Memory L1 core

© Hardavellas 3 Contributions We show that: 1.L2 caches growing bigger and slower Bottleneck shifts from Mem to L2 DBMS absolute performance drops Must enhance DBMS L1 locality 2.HW parallelism scales exponentially DBMS cannot exploit parallelism under light load Need inherent DBMS parallelism

© Hardavellas 4 Methodology Flexus simulator (developed at CMU)  Cycle-accurate, full-system OLTP: TPC-C, 100wh, in memory DSS: TPC-H throughput, 1GB db, in memory  Scan- and join-bound queries (1, 6, 13, 16) Saturated: 64/16 clients (OLTP/DSS) Unsaturated (light load): 1 client

© Hardavellas 5 Bottleneck shift from Mem stalls to L2-hit stalls PIII Xeon 500 (1999) Itanium (2006) Xeon 7100 (2006) Observation #1 Bottleneck Shift to L2-hit Stalls

© Hardavellas 6 Impact of L2-hit Stalls Increasing cache size reduces throughput Must enhance L1 locality

© Hardavellas 7 Observation #2 Parallelism in Modern CMPs Fat Camp (FC) wide-issue, OOO e.g., IBM Power5 Lean Camp (LC) in-order, multi-threaded e.g., Sun UltraSparc T1 one core FC: parallelism within thread, LC: across threads

© Hardavellas 8 How Camps Address Stalls LC: stalls can dominate under unsaturated FC: stalls exposed in all cases LC FC Unsaturated Saturated thread1 thread2 computation data stall thread1 thread2 thread3

© Hardavellas 9 Prevalence of Data Stalls DBMS need parallelism & L1D locality

© Hardavellas 10 Impact L2 caches growing bigger and slower HW parallelism scales exponentially  Bottlenecks shift, data stalls are exposed DBMS must provide both  Fine-grain parallelism across and within queries  L1 locality