Presented by: Eric Carty-Fickes

Slides:



Advertisements
Similar presentations
Advanced Oracle DB tuning Performance can be defined in very different ways (OLTP versus DSS) Specific goals and targets must be set => clear recognition.
Advertisements

High Performing Cache Hierarchies for Server Workloads
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors Jack L. Lo, Luiz André Barroso, Susan Eggers Kourosh Gharachorloo,
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
Computer Architecture Lab at 1 P ROTO F LEX : FPGA-Accelerated Hybrid Functional Simulator Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi,
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
Chapter 4 Assessing and Understanding Performance
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Western Research Laboratory Design and Evaluation of Architectures for Commercial Applications Luiz André Barroso Part III: architecture studies.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Presented by Deepak Srinivasan Alaa Aladmeldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark Hill and David Wood Computer Sciences.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.
Multi Core Processor Submitted by: Lizolen Pradhan
March 19981© Dennis Adams Associates Tuning Oracle: Key Considerations Dennis Adams 25 March 1998.
Lecture 19: Virtual Memory
CAECW Salt Lake City -- Veazey & Gaither Varying Memory Size with TPC-C Performance and Resource Effects Jay Veazey and Blaine Gaither Hewlett-Packard.
Oracle Tuning Ashok Kapur Hawkeye Technology, Inc.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
Srihari Makineni & Ravi Iyer Communications Technology Lab
1 Chapter 17 Shared Memory Contention. 2 Overview Specifically talking about SGA – Buffer Cache – Redo Log Buffer Contention in these areas of SGA – Can.
Performance Analysis of the Compaq ES40--An Overview Paper evaluates Compaq’s ES40 system, based on the Alpha Only concern is performance: no power.
MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC;
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
An Architectural Evaluation of Java TPC-W Harold “Trey” Cain, Ravi Rajwar, Morris Marden, Mikko Lipasti University of Wisconsin-Madison
Sunpyo Hong, Hyesoon Kim
EGRE 426 Computer Organization and Design Chapter 4.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
System Programming Basics Cha#2 H.M.Bilal. Operating Systems An operating system is the software on a computer that manages the way different programs.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Computer Sciences Department University of Wisconsin-Madison
Presented by: Nick Kirchem Feb 13, 2004
Lecture 2: Performance Evaluation
Lecture 21 Concurrency Introduction
Memory System Characterization of Commercial Workloads
Intel’s Core i7 Processor
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Virtual Memory 4 classes to go! Today: Virtual Memory.
Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
Simulating a $2M Commercial Server on a $2K PC
Interpret the execution mode of SQL query in F1 Query paper
Performance of computer systems
CS 3410, Spring 2014 Computer Science Cornell University
Low Overhead Interrupt Handling with SMT
CS Introduction to Operating Systems
Presentation transcript:

Presented by: Eric Carty-Fickes Memory System Characterization of Commercial Workloads L.A. Barroso, K. Gharachorloo and E. Bugnion Western Research Laboratory Digital Equipment Corporation Presented by: Eric Carty-Fickes

Introduction commercial workloads > engineering but most still using scientific benchmarks (in 1998) difficult to create commercial benchmarks large, expensive, proprietary, changing paper uses commercial workloads to study current trends

Database Workloads first two run on Oracle DB server OLTP small r/w queries on part of DB models banking req’s in dedicated mode more kernel time; hides I/O DSS (decision support systems) long read-only queries on much of DB models wholesaler’s SQL queries fewer context-switches

Database Workloads Web Index Search doesn’t require DB server multiple threads hide misses read-only req’s and cached recent searches

Test Systems 4 processor AlphaServer 4100 and 8 processor 8400 for hardware testing IPROBE tool for event counting DCPI for profiling ATOM for studying ORACLE SimOS for testing architectural changes models Alpha 21164 simplified, but still with some detail

Aspects of Testing 3 issues: memory size, I/O bandwidth, runtime scale down DB change block buffer cache sizes OLTP and DSS: need to warm up SGA before testing; need to scale DB to be resident Web Index: no scaling – same system

Hardware Results OLTP – higher CPI, maybe due to TPC-B long secondary cache latency lots of primary cache misses, esp Icache dirty miss latency significant, lots of communication DSS – lower CPI means this config works only suggestion is larger 1st level caches AltaVista – use 8400 just like original good CPI, well written code 1st level caches important

Simulator Results simulator like hardware, some cache and consistency differences = different timing close cycle counts, miss rates OLTP – test assoc and Bcache size idle time increase when servers can’t hide I/O lots of cache intricacies… bigger caches = fewer replacemt, inst misses – more important for OLTP than DSS bigger lines = more true sharing, less cold missing

Conclusions scaled OLTP and DSS give a decent estimate of real performance fairly narrow range of architectural issues explored more processes/processor = less I/O latency, fewer dirty misses simulators gloss over important details for ease of use (timing, OS, etc.)

Questions Can you get enough information by scaling down the DB and playing tricks with block buffer sizes?