1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.

Slides:



Advertisements
Similar presentations
Chapter 8 Virtual Memory
Advertisements

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
University of Maryland Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst Mustafa M. Tikir Jeffrey K. Hollingsworth.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
CS 104 Introduction to Computer Science and Graphics Problems
Virtual Memory Chapter 8.
Computer System Overview
Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.
Flexicache: Software-based Instruction Caching for Embedded Processors Jason E Miller and Anant Agarwal Raw Group - MIT CSAIL.
Chapter 3 Memory Management: Virtual Memory
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Computer Systems Overview. Page 2 W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware resources of one.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-7 Memory Management (1) Department of Computer Science and Software.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Operating Systems and Networks AE4B33OSS Introduction.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Chapter 4 Memory Management Virtual Memory.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Virtual Memory 1 1.
Operating System Isfahan University of Technology Note: most of the slides used in this course are derived from those of the textbook (see slide 4)
Introduction to Virtual Memory and Memory Management
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processes and Virtual Memory
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Memory Management. Why memory management? n Processes need to be loaded in memory to execute n Multiprogramming n The task of subdividing the user area.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Computer Systems Overview. Lecture 1/Page 2AE4B33OSS W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware.
Virtual Memory Chapter 8.
Kernel Code Coverage Nilofer Motiwala Computer Sciences Department
Chapter 1 Computer System Overview
Chapter 2 Memory and process management
Memory COMPUTER ARCHITECTURE
Understanding Operating Systems Seventh Edition
Chapter 8: Main Memory.
CS161 – Design and Architecture of Computer
Memory Caches & TLB Virtual Memory
Section 9: Virtual Memory (VM)
Today How was the midterm review? Lab4 due today.
Chapter 9 – Real Memory Organization and Management
What we need to be able to count to tune programs
CSCI206 - Computer Organization & Programming
CSCI1600: Embedded and Real Time Software
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Chapter 9: Virtual-Memory Management
Adaptive Code Unloading for Resource-Constrained JVMs
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
José A. Joao* Onur Mutlu‡ Yale N. Patt*
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Chapter 1 Computer System Overview
Virtual Memory: Working Sets
CSCI1600: Embedded and Real Time Software
What Are Performance Counters?
Virtual Memory 1 1.
Presentation transcript:

1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth University of Maryland Department of Computer Science

2 University of Maryland Introduction Cache behavior information is important –Processor speed increasing faster than memory Should relate cache info to data structures –More useful to programmer in tuning applications Collect using hardware –Software techniques, such as simulation, are slow –In the past, limited hardware support –Situation is changing, hardware support more common

3 University of Maryland Outline Measuring cache misses –Sampling Information about evictions –What is required –Sampling Simulation-based study –The simulator and applications used –Results Conclusions and future work

4 University of Maryland Finding Objects With Most Cache Misses Handling every cache miss is slow –Use sampling, requirements: Periodic interrupt on cache miss Ability to determine miss address Associate count with each object –Variable or dynamically allocated memory Interrupt after every n cache misses –Obtain address of miss –Find object containing it and increment count

5 University of Maryland Interactions Between Objects Why does data leave the cache? –What object caused it to be replaced? Hardware could provide eviction information –When miss occurs, save address of evicted data Not difficult to provide physical address –Can calculate from tag of evicted cache line –Information in OS can map physical to virtual May be imprecise due to paging

6 University of Maryland Measuring Eviction Information Use sampling, store more at each miss –Object that caused the miss –Object containing the data that was evicted –Part of code it happened in Questions –“Buckets” much smaller, will sampling be accurate? –Data structure more complicated, how efficient?

7 University of Maryland Experiments Implemented in simulation –Simulator uses ATOM binary rewriting tool Instrument load/stores for cache simulation Instrument basic blocks for virtual cycle count Simulates necessary hardware support –Miss and eviction sampling runs under simulation Tested using SPEC95/2000 applications –su2cor, applu, equake, gzip, mgrid, swim, wupwise, … –Sampled 1 in 25,000 misses

8 University of Maryland Accuracy of Sampling Cache Misses ApplicationVariable ActualSample Rank% % su2cor U R-loops S W2-intact W2-sweep swim UNEW PNEW VNEW CU H56.99

9 University of Maryland Eviction Results: mgrid

10 University of Maryland Evictions By Code Region: mgrid VariableFunctionLine ActualSample Rank% % U resid interp interp interp interp Vresid R psinv resid % of total evictions of U by U, V, and R in each line of code.

11 University of Maryland Cache Misses Due to Instrumentation

12 University of Maryland Instrumentation Overhead

13 University of Maryland Simulation Overhead

14 University of Maryland Using Dyninst Better knowledge about objects –Local variables –FORTRAN common blocks Can instrument memory allocation routines –Track objects created/destroyed Measure by code using hardware counters –Save counts at significant points, like Paradyn Function entries/exits/calls –Turn counting on & off around areas of interest

15 University of Maryland Instrumenting Loads and Stores New BPatch_point type –BPatch_loadStore –New method, isStore(), returns true or false New expression type – BPatch_effectiveAddr Only valid at BPatch_loadStore points Returns the effective address being accessed

16 University of Maryland Future Work Run miss sampling on real hardware –IBM POWER3, POWER4 –Use Dyninst Visualization tool –Save all data in compact format tool understands For tested applications, largest file is 15MB –Filter by objects, parts of code –Compare data from different runs Use results to optimize applications

17 University of Maryland Future Work Continued More uses of eviction information –For estimating portion of object in cache Use difference of misses and evictions –For finding lost opportunities for reuse Track evicted data to until next load Measure interval in time, cache misses, etc.

18 University of Maryland Conclusions Features are appearing in new processors –Possible to implement cache miss sampling now –Much more efficient than software simulation Eviction information in hardware practical –Sampling is efficient and accurate Could use Dyninst –For simulation or for hardware