Microbenchmarks for Memory Hierarchy

Slides:



Advertisements
Similar presentations
Cache and Virtual Memory Replacement Algorithms
Advertisements

Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
361 Computer Architecture Lecture 15: Cache Memory
A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Memory Hierarchy. Smaller and faster, (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
LRU Replacement Policy Counters Method Example
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
By: Aidahani Binti Ahmad
Chapter Twelve Memory Organization
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Parallel and Distributed Simulation Time Parallel Simulation.
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Project 11: Influence of the Number of Processors on the Miss Rate Prepared By: Suhaimi bin Mohd Sukor M
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
01/26/2009CS267 - Lecture 2 1 Experimental Study of Memory (Membench)‏ Microbenchmark for memory system performance time the following loop (repeat many.
Cache Memory Yi-Ning Huang. Principle of Locality Principle of Locality A phenomenon that the recent used memory location is more likely to be used again.
Virtual memory.
Cache Memory.
COSC3330 Computer Architecture
ECE232: Hardware Organization and Design
Replacement Policy Replacement policy:
Multilevel Memories (Improving performance using alittle “cash”)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture: Cache Hierarchies
The Problem Finding a needle in haystack An expert (CPU)
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers
Lecture: Cache Hierarchies
Adaptive Cache Replacement Policy
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Directory-based Protocol
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
BIC 10503: COMPUTER ARCHITECTURE
Chapter 5 Memory CSE 820.
ECE 445 – Computer Organization
Module IV Memory Organization.
Set-Associative Cache
Information Storage and Spintronics 14
Another Performance Evaluation of Memory Hierarchy in Embedded Systems
Lecture: Cache Innovations, Virtual Memory
CPE 631 Lecture 05: Cache Design
Virtual Memory فصل هشتم.
Chapter 6 Memory System Design
Chap. 12 Memory Organization
Cache Replacement in Modern Processors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
Lecture 21: Memory Hierarchy
Cache - Optimization.
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Microbenchmarks for Memory Hierarchy Brooks Mattox Matthew Sweet

Overview Objective Microbenchmarks Verifying Known P4 Specifications Vtune Data observations Tree-based Pseudo Least Recently Used Policy Conclusions

Objective Verify known attributes of P4 using Vtune and microbenchmarks Determine the LRU policy of Pentium 4 using similar benchmarks

Microbenchmark for (i = 0; i < iterations; i++) { for (j = 0; j < vectorSize; j = j + stride) { vector[j] = vector[j] + 1; }

Verify L1 & L2 cache size Measure the number of cache misses over an interval of vector size increases Point at which cache misses begin to increase substantially with corresponding vector size, indicates cache size

L1 Cache Size Benchmark Results ~8KB

L2 Cache Size Benchmark Results ~512KB

Suspected P4 LRU Policy Tree-based Pseudo LRU Characteristics Requires only one track bit for 2-way associativity With higher associativity PLRUt still has better performance and lower complexity than the basic LRU, Round Robin, or Random policies.

Tree-based Pseudo Least Recently Used Policy

Sources Aleksandar Milenkovic, “Cache Replacement Polices for Future Processors” Rafael H. Saavedra, Chapter 5 - "Locality Effects and Characterization of the Memory Hierarchy" in “CPU Evaluation Performance and Execution Time Prediction Using Narrow Spectrum Benchmarking”