Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.

Slides:



Advertisements
Similar presentations
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Advertisements

Chapter 8 Virtual Memory
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Operating System Support Focus on Architecture
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Memory Management 2010.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Virtual Memory Chapter 8.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Chapter 13 Embedded Systems
1 Lecture 9: Virtual Memory Operating System I Spring 2007.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Virtual Memory CSCI 444/544 Operating Systems Fall 2008.
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
CS 153 Design of Operating Systems Spring 2015 Final Review.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5B:Virtual Memory Adapted from Slides by Prof. Mary Jane Irwin, Penn State University Read Section 5.4,
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
AMRM: Project Technical Approach Rajesh Gupta Project Kickoff Meeting November 5, 1998 Washington DC A Technology and Architectural View of Adaptation.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
1 Lecture 8: Virtual Memory Operating System Fall 2006.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 Lecture 5a: CPU architecture 101 boris.
Virtual memory.
Memory Caches & TLB Virtual Memory
CSC 4250 Computer Architectures
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 8 Virtual Memory
Lecture 10: Virtual Memory
Bojian Zheng CSCD70 Spring 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Overcoming main memory size limitation
CSC3050 – Computer Architecture
CSE 471 Autumn 1998 Virtual memory
Virtual Memory: Working Sets
Main Memory Background
Department of Computer Science University of California, Santa Barbara
Principle of Locality: Memory Hierarchies
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities for Adaptivity Cache organization Cache performance “assist” mechanisms Hierarchy organization Memory organization (DRAM, etc) Data layout and address mapping Virtual Memory Compiler assist

Opportunities - Cont’d Cache organization: adapt what? –Size: NO –Associativity: NO –Line size: MAYBE, –Write policy: YES (fetch,allocate,w-back/thru) –Mapping function: MAYBE

Opportunities - Cont’d Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels. Adapt what? –Which mechanism(s) to use –Mechanism “parameters”

Opportunities - Cont’d Hierarchy Organization: –Where are cache assist mechanisms applied? Between L1 and L2 Between L1 and Memory Between L2 and Memory –What are the data-paths like? Is prefetch, victim cache, write buffer data written into the cache? How much parallelism is possible in the hierarchy?

Opportunities - Cont’d Memory Organization –Cached DRAM? –Interleave change? –PIM

Opportunities - Cont’d Data layout and address mapping –In theory, something can be done but… –MP case is even worse –Adaptive address mapping or hashing based on ???

Opportunities - Cont’d Compiler assist –Can select initial configuration –Pass hints on to hardware –Generate code to collect run-time info and adjust execution –Adapt configuration after being “called” at certain intervals during execution –Select/run-time optimize code

Opportunities - Cont’d Virtual Memory can adapt –Page size? –Mapping? –Page prefetching/read ahead –Write buffer (file cache) –The above under multiprogramming?

Applying Adaptivity What Drives Adaptivity? Performance impact, overall and/or relative “Effectiveness”, e.g. miss rate Processor Stall introduced Program characteristics When to perform adaptive action –Run time: use feedback from hardware –Compile time: insert code, set up hardware

Where to Implement In Software: compiler and/or OS +(Static) Knowledge of program behavior +Factored into optimization and scheduling -Extra code, overhead -Lack of dynamic run-time information -Rate of adaptivity -requires recompilation, OS changes

Where to Implement - Cont’d Hardware +dynamic information available +fast decision mechanism possible +transparent to software (thus safe) –delay, clock rate limit algorithm complexity –difficult to maintain long-term trends –little knowledge of about program behavior

Where to Implement - Cont’d Hardware/software +Software can set coarse hardware parameters +Hardware can supply software dynamic info +Perhaps more complex algorithms can be used –Software modification required –Communication mechanism required

Current Investigation L1 cache assist –See wide variability in assist mechanisms effectiveness between Individual Programs Within a program as a function of time –Propose hardware mechanisms to select between assist types and allocate buffer space –Give compiler an opportunity to set parameters

Mechanisms Used Prefetching –Stream Buffers –Stride-directed, based on address alone –Miss Stride: prefetch the same address using the number of intervening misses Victim Cache Write Buffer, all after L1

Mechanisms Used - Cont’d A mechanism can be used by itself or All are used at once Buffer space size and organization fixed No adaptivity involved

Observed Behavior Programs exhibit different effect from each mechanism, e.g none a consistent winner Within a program the same holds in the time domain between mechanisms.

Observed Behavior - Cont’d Both of the above facts indicate a likely improvement from adaptivity –Select a better one among mechanisms Even more can be expected from adaptively re- allocating from the combined buffer pool –To reduce stall time –To reduce the number of misses

Proposed Adaptive Mechanism Hardware: –a common pool of 2-4 word buffers –a set of possible policies, a subset of: Stride-directed prefetch PC-based prefetch History-based prefetch Victim cache Write buffer

Adaptive Hardware - Cont’d Performance monitors for each type/buffer –misses, stall time on hit, thresholds Dynamic buffer allocator among mechanisms Allocation and monitoring policy: –Predict future behavior from observed past –Observe over a time interval dT, set for next –Save perform. trends in next-level tags (<8bits)

Further opportunities to adapt L2 cache organization –variable-size line L2 non-sequential prefetch In-memory assists (DRAM)

MP Opportunities Even longer latency Coherence, hardware or software Synchronization Prefetch under and beyond the above –Avoid coherence if possible –Prefetch past synchronization Assist Adaptive Scheduling