Memory System Characterization of Commercial Workloads

Slides:



Advertisements
Similar presentations
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Advertisements

To Include or Not to Include? Natalie Enright Dana Vantrease.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Nikos Hardavellas, Northwestern University
High Performing Cache Hierarchies for Server Workloads
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
1 Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors Jack L. Lo, Luiz André Barroso, Susan Eggers Kourosh Gharachorloo,
Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat.
Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge.
Chapter 4 M. Keshtgary Spring 91 Type of Workloads.
Memory System Characterization of Big Data Workloads
“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
Western Research Laboratory Design and Evaluation of Architectures for Commercial Applications Luiz André Barroso Part III: architecture studies.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
SECTION 1: INTRODUCTION TO SIMICS Scott Beamer CS152 - Spring 2009.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Using Standard Industry Benchmarks Chapter 7 CSE807.
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
A Time Predictable Instruction Cache for a Java Processor Martin Schoeberl.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Profiling Memory Subsystem Performance in an Advanced POWER Virtualization Environment The prominent role of the memory hierarchy as one of the major bottlenecks.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Performance Analysis of the Compaq ES40--An Overview Paper evaluates Compaq’s ES40 system, based on the Alpha Only concern is performance: no power.
Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Luiz André Barroso, Kourosh Gharachorloo,
MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC;
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Department of Computer Science 6 th Annual Austin CAS Conference – 24 February 2005 Ricardo Portillo, Diana Villa, Patricia J. Teller The University of.
Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Computer Sciences Department University of Wisconsin-Madison
Improving Cache Performance using Victim Tag Stores
Presented by: Nick Kirchem Feb 13, 2004
ASR: Adaptive Selective Replication for CMP Caches
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Chapter 5 Memory CSE 820.
Presented by: Eric Carty-Fickes
Lecture 24: Memory, VM, Multiproc
Tools.
Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
Simulating a $2M Commercial Server on a $2K PC
Virtual Memory Overcoming main memory size limitation
Tools.
Contents Memory types & memory hierarchy Virtual memory (VM)
Lecture 23: Virtual Memory, Multiprocessors
Exploring Core Designs for Chip Multiprocessors
Main Memory Background
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Memory System Characterization of Commercial Workloads Luiz Andre Barroso, Kourosh Gharachorloo, and Edouard Bugnion Presented by Jerry Wu

Introduction Motivation Commercial workloads has become the largest market segment for multiprocessor servers Design of these systems has yet to keep up with the pace of changes in the market Lack of commercial workload performance requirements This paper presents performance studies of three classes of commercial workloads

Complications Lack of availability and restrictions Large scale Lack of easy access to commercial database engines Large scale Large hardware cost Complexity Non-trivial OS-I/O interactions, lack of source code Moving target Commercial database engines improve at a very fast pace

Commercial Workloads OLTP Workload DSS Workload Modeled after the TPC-B benchmark Models a banking system DSS Workload Modeled after the TPC-D benchmark Simulates the decision support system for a supplier OLTP and DSS workloads run on Oracle Database Engine Web Index Search Workload State-of-the-art search engine AltaVista (No Google yet)

Methodology Monitoring Simulation Key issues OLTP and DSS benchmarks were run on Alpha 21164 using Oracle Utilized IPROBE monitoring tool to access event counters Simulation Used an Alpha port of SimOS Key issues Amount of physical memory required Bandwidth requirement for the I/O Total runtime

Monitoring Results DSS and AltaVista OLTP Importance of hits and misses to secondary caches and latency of dirty misses DSS and AltaVista Hits in secondary on-chip cache is the only significant memory component

Simulation Results Uses an Alpha port of SimOS Observations on OLTP Small kernel component Benefit from larger cache and higher associativity Cache and memory system stalls have large effects Idle time increases with bigger caches Higher processing rates results in less demand on I/O Small fraction of communication in Oracle due to false sharing

Cache Hierarchy Performance Primary cache miss important for OLTP, but more so for DSS OLTP and DSS have very different cache performance Large on-chip cache captures most of the misses in DSS, but not in OLTP False sharing increases for increasing cache line size Replacement and instruction miss rate not visibly effected

Questions