An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors Jack L. Lo, Luiz André Barroso, Susan Eggers Kourosh Gharachorloo,

Slides:



Advertisements
Similar presentations
The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.
Advertisements

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.
CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.
Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Hikmet Aras
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Multithreading processors Adapted from Bhuyan, Patterson, Eggers, probably others.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
CS 7810 Lecture 16 Simultaneous Multithreading: Maximizing On-Chip Parallelism D.M. Tullsen, S.J. Eggers, H.M. Levy Proceedings of ISCA-22 June 1995.
Memory System Characterization of Big Data Workloads
EECE476 Lecture 28: Simultaneous Multithreading (aka HyperThreading) (ISCA’96 research paper “Exploiting Choice…Simultaneous Multithreading Processor”
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
SMT Parallel Applications –For one program, parallel executing threads Multiprogrammed Applications –For multiple programs, independent threads.
CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
EECC722 - Shaaban #1 Lec # 4 Fall SMT Issues SMT CPU performance gain potential. Modifications to Superscalar CPU architecture necessary.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.
1 Lecture 12: ILP Innovations and SMT Today: ILP innovations, SMT, cache basics (Sections 3.5 and supplementary notes)
How Multi-threading can increase on-chip parallelism
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
Western Research Laboratory Design and Evaluation of Architectures for Commercial Applications Luiz André Barroso Part III: architecture studies.
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.
Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.
Caching and Demand-Paged Virtual Memory
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Copyright 2005, Data Mining Research Lab, The Ohio State University Cache-conscious Frequent Pattern Mining on a Modern Processor Amol Ghoting, Gregory.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Srihari Makineni & Ravi Iyer Communications Technology Lab
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Hikmet Aras
Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Patricia J. Teller, and Jaime Acosta The University of Texas at.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC;
A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd.
Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*
1 Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
L2-Cache Miss Profiling on the p690 for a Large-scale Database Application Trevor Morgan, Diana Villa, Patricia J. Teller, and Jaime Acosta The University.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Computer Sciences Department University of Wisconsin-Madison
COMP 740: Computer Architecture and Implementation
Zhichun Zhu Zhao Zhang ECE Department ECE Department
Simultaneous Multithreading
Simultaneous Multithreading
Memory System Characterization of Commercial Workloads
Lecture: SMT, Cache Hierarchies
SMT Issues SMT-7 SMT-8 SMT-9 SMT CPU performance gain potential.
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
Presented by: Eric Carty-Fickes
Virtual Memory فصل هشتم.
Levels of Parallelism within a Single Processor
Resource Replication 6 Integer Units 4 FP units 8 Sets of architectural registers Renaming registers (Int/FP) HW Context (PC, Return Stack.
Presentation transcript:

An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors Jack L. Lo, Luiz André Barroso, Susan Eggers Kourosh Gharachorloo, Henry Levy, Sujay Parekh

Motivation DBMS and scientific workloads are different DBMS workload is intrinsically multithreaded DBMS is memory intensive, therefore low processor utilization Potential poor memory performance introduced by SMT cache sharing

Objectives Identify the memory-system behavior of database systems Evaluate the negative effect of cache sharing introduced by SMT, and try to eliminate it Evaluate SMT performance for DBMS workloads

Methodology SMT model  Based on out-of-order, superscalar architecture  During each cycle, 8 instructions can be fetched from up to 2 of the 8 hardware contexts  FUs: 6 integer, 4 FP  128K I + 128K D, 16MB L2 cache Workloads  Oracle DBMS and Digital UNIX  On-line transaction processing (OLTP)  Decision support system (DSS)

Database Workload Characterization 3 segments of memory that are accessed by dominating processes:  Instruction text  Program Global Area (PGA)  Shared Global Area (SGA) SGA buffer cache SGA other

Memory Behavior High instruction miss rate for OLTP  Large memory footprint  High instruction/data reuse  Replacement is too frequent

Locality Profiles

Multi-Thread Cache Interference Two types of interference  Destructive interference One thread’s data replaces another thread’s data Higher conflict misses  Constructive interference Data loaded by one thread is used by another simultaneously-scheduled thread Fewer misses

Identifying source of misses PGA misses are the dominating factor  Caused by destructive interference

Page-mapping Policies Affect L2 cache conflicts Two policies  Page coloring Spatial locality  Bin hopping Temporal locality

Effect of Page-mapping policies

Application-Level Offsetting Affect L1 cache conflicts Offset the conflicting structures of different processes

SMT Performance on DBMS Workloads SMT is highly effective in tolerating the high miss rates

Architecture Metrics

Conclusions While database workloads have large footprints, there is substantial reuse that results in a small, cacheable “critical” working set Additional data cache conflicts caused by SMT can be nearly eliminated SMT’s latency tolerance is highly effective for database applications