Lec17.1 °For in-order pipeline, 2 options: Freeze pipeline in Mem stage (popular early on: Sparc, R4000) IF ID EX Mem stall stall stall … stall Mem Wr.

Slides:



Advertisements
Similar presentations
TRIPS Primary Memory System Simha Sethumadhavan 1.
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Performance Improvement 23rd Feb, 2006.
Lecture 12 Reduce Miss Penalty and Hit Time
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Performance of Cache Memory
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
Dukki Hong 1 Youngduke Seo 1 Youngsik Kim 2 Kwon-Taek Kwon 3 Sang-Oak Woo 3 Seok-Yoon Jung 3 Kyoungwoo Lee 4 Woo-Chan Park 1 1 Media Processor Lab., Sejong.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
Caches Vincent H. Berk October 21, 2005
CIS429/529 Cache Basics 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality °Three.
CS152 Computer Architecture and Engineering Lecture 20 Caches April 14, 2003 John Kubiatowicz ( lecture slides:
CS252/Culler Lec 4.1 1/31/02 CS203A Graduate Computer Architecture Lecture 14 Cache Design Taken from Prof. David Culler’s notes.
EENG449b/Savvides Lec /1/04 April 1, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.
CS252/Kubiatowicz Lec /22/03 CS252 Graduate Computer Architecture Lecture 15 Prediction (Finished) Caches I: 3 Cs and 7 ways to reduce misses October.
Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.
CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH Instructor: Prof. Jason Cong Lecture 15 More Caches and Virtual Memories.
CS252/Kubiatowicz Lec 3.1 1/24/01 CS252 Graduate Computer Architecture Lecture 3 Caches and Memory Systems I January 24, 2001 Prof. John Kubiatowicz.
CS152 / Kubiatowicz Lec21.1 4/21/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 21 Memory Systems (recap) Caches April 21, 2003.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
EENG449b/Savvides Lec /7/05 April 7, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Lec17.1 °Q1: Where can a block be placed in the upper level? (Block placement) °Q2: How is a block found if it is in the upper level? (Block identification)
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.
The Big Picture: Where are We Now?
CS 152 L7.2 Cache Optimization (1)K Meinz Fall 2003 © UCB CS152 – Computer Architecture and Engineering Lecture 13 – Fastest Cache Ever! 14 October 2003.
CS252/Kubiatowicz Lec 4.1 1/26/01 CS252 Graduate Computer Architecture Lecture 4 Caches and Memory Systems January 26, 2001 Prof. John Kubiatowicz.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Sogang University Advanced Computing System Chap 2. Processor Technology Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
CS /02 Semester II Help Session IIA Performance Measures Colin Tan S
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
CS203 – Advanced Computer Architecture Pipelining Review.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CS2100 Computer Organization
Instruction Level Parallelism
Computer Architecture Principles Dr. Mike Frank
5.2 Eleven Advanced Optimizations of Cache Performance
Morgan Kaufmann Publishers
\course\cpeg323-08F\Topic6b-323
ECE 445 – Computer Organization
Lecture 14: Reducing Cache Misses
Chapter 5 Memory CSE 820.
CS203A Graduate Computer Architecture Lecture 13 Cache Design
Morgan Kaufmann Publishers
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 20: OOO, Memory Hierarchy
Operation of the Basic SM Pipeline
CS 286 Computer Architecture & Organization
pipelining: data hazards Prof. Eric Rotenberg
Cache - Optimization.
Cache Performance Improvements
Presentation transcript:

Lec17.1 °For in-order pipeline, 2 options: Freeze pipeline in Mem stage (popular early on: Sparc, R4000) IF ID EX Mem stall stall stall … stall Mem Wr IF ID EX stall stall stall … stall stall Ex Wr Use Full/Empty bits in registers + MSHR queue -MSHR = “Miss Status/Handler Registers” (Kroft) Each entry in this queue keeps track of status of outstanding memory requests to one complete memory line. –Per cache-line: keep info about memory address. –For each word: register (if any) that is waiting for result. –Used to “merge” multiple requests to one memory line -New load creates MSHR entry and sets destination register to “Empty”. Load is “released” from pipeline. -Attempt to use register before result returns causes instruction to block in decode stage. -Limited “out-of-order” execution with respect to loads. Popular with in-order superscalar architectures. °Out-of-order pipelines already have this functionality built in… (load queues, etc). What happens on a Cache miss?

Lec Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache. Time = IC x CCT x (ideal CPI + memory stalls/instruction) Average Memory Access time = Hit Time + (Miss Rate x Miss Penalty) = Improving Cache Performance: 3 general options

Lec17.3 Conflict Compulsory vanishingly small 3Cs Absolute Miss Rate (SPEC92)

Lec17.4 Conflict miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2 2:1 Cache Rule

Lec17.5 Conflict Flaws: for fixed block size Good: insight => invention 3Cs Relative Miss Rate