Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.17.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1
Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
處理器設計與實作 CPU LAB for Computer Organization 1. LAB 7: MIPS Set-Associative D- cache.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.
Caches-2 Constructive Computer Architecture Arvind
Realistic Memories and Caches
Constructive Computer Architecture Tutorial 6 Cache & Exception
Realistic Memories and Caches
Multilevel Memories (Improving performance using alittle “cash”)
Caches-2 Constructive Computer Architecture Arvind
Constructive Computer Architecture Tutorial 7 Final Project Overview
Realistic Memories and Caches
Modular Refinement Arvind
Modular Refinement Arvind
Cache Coherence Constructive Computer Architecture Arvind
Module IV Memory Organization.
Modules with Guarded Interfaces
Cache Coherence Constructive Computer Architecture Arvind
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Realistic Memories and Caches
CDA 5155 Caches.
Caches-2 Constructive Computer Architecture Arvind
Modular Refinement - 2 Arvind
Caches and store buffers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CS 3410, Spring 2014 Computer Science Cornell University
Modular Refinement Arvind
Modular Refinement Arvind
Cache Coherence Constructive Computer Architecture Arvind
Caches-2 Constructive Computer Architecture Arvind
Presentation transcript:

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1

2-level Data Cache Interface (0,n) interface mkL2Cache; method ActionValue#(Bool) req(MemReq r); method ActionValue#(Maybe#(Data)) resp; endinterface L1 cache Processor DRAM ReqRespMem req accepted resp L2 cache req accepted resp April 4, 2012 L15-2http://csg.csail.mit.edu/6.S078

2-level Data Cache Internals L1 Cache req accepted resp hit wire missFifo evict wire miss wire TagData Store L2 Cache req accepted resp missFifo ReqRespMem TagData Store respFifo reqFifo Let’s build an inclusive L2 with our earlier direct-mapped L1 April 4, 2012 L15-3http://csg.csail.mit.edu/6.S078

L2 cache: hit case rule procReq(!missFifo.notEmpty); let req = reqFifo.first; let hitOrMiss <- store.tagLookup(req.addr); if(hitOrMiss.hit) begin let line <- store.dataAccess( getCacheDataReqLine(hitOrMiss.id, req)); if(req.op == Ld) respFifo.enq(unpack(pack(line.Line))); reqFifo.deq; end : endrule April 4, 2012 L15-4http://csg.csail.mit.edu/6.S078

dirty case else if (hitOrMiss.dirty) begin let line <- store.dataAccess(CacheDataReq{id: hitOrMiss.id, reqType: tagged InvalidateLine}); mem.reqEnq(MemReq{op: St, addr: hitOrMiss.addr, data: unpack(pack(line.Line))}); end else What about corresponding line in L1? April 4, 2012 L15-5http://csg.csail.mit.edu/6.S078

miss case else begin missFifo.enq(tuple2(hitOrMiss.id, req)); req.op = Ld; mem.reqEnq(req); reqFifo.deq; end April 4, 2012 L15-6http://csg.csail.mit.edu/6.S078

Process miss from memory rule procMiss; match {.id,.req} = missFifo.first; Line line = req.op == Ld? unpack(pack (mem.respFirst)) : unpack(pack(req.data)); Bool dirty = req.op == Ld? False: True; let l <- store.dataAccess(CacheDataReq{id: id, reqType: tagged FillLine (tuple3(line, req.addr, dirty))}); if(req.op == Ld) respFifo.enq(unpack(pack(line))); missFifo.deq; mem.respDeq; endrule April 4, 2012 L15-7http://csg.csail.mit.edu/6.S078

How will inclusive L2 need to change to interface with a n-way set-associative L1? L1 Cache req accepted resp hit wire missFifo evict wire miss wire TagData Store L2 Cache req accepted resp missFifo ReqRespMem TagData Store respFifo reqFifo When a L2 cache line is evicted, the corresponding L1 cache line needs to be invalidated - Interface: + Invalidate information - L1 cache logic: tagLookup, invalidate April 4, 2012 L15-8http://csg.csail.mit.edu/6.S078

Multi-Level Cache Organizations Inclusive vs. Non-Inclusive vs. Exclusive Memory disk L1 Icache L1 Dcache regs L2 Cache Processor Separate data and instruction caches, vs. a Unified cache April 4, 2012 L15-9http://csg.csail.mit.edu/6.S078

Write-Back vs. Write-Through Caches in Multi-Level Caches Write back Writes only go into top level of hierarchy Maintain a record of “dirty” lines Faster write speed (only has to go to top level to be considered complete) Write through All writes go into L1 cache and then also write through into subsequent levels of hierarchy Better for “cache coherence” issues No dirty/clean bit records required Faster evictions 10

Summary Choice of cache interfaces Combinational vs non-combinational responses Blocking vs non-blocking FIFO vs non-FIFO responses Many possible internal organizations Direct Mapped and n-way set associative are the most basic ones Writeback vs Write-through Write allocate or not Multi-level caches: Likely organizations? Associativity, Block Size? next integrating into the pipeline April 4, 2012 L15-11http://csg.csail.mit.edu/6.S078

Recap: Six Stage Pipeline F Fetch D Decode R Reg Read X Execute M Memory W Write- back Writeback feeds back directly to ReadReg rather than through FIFO April 4, 2012 L15-12http://csg.csail.mit.edu/6.S078

Multi-cycle execute M Incorporating a multi-cycle operation into a stage R M2 M3 M1 X D C η B ζ A April 4, 2012 L15-13http://csg.csail.mit.edu/6.S078

Multi-cycle function unit module mkMultistage(Multistage); State1 m1 <- mkReg(); State2 m2 <- mkReg(); method Action request(Operand operand); m1.enq(doM1(operand)); endmethod rule s1; m2.enq(doM2(m1.first)); m1.deq(); endrule method ActionValue Result response(); return doM3(m2.first); m2.deq(); endmethod endmodule  Perform first stage of pipeline  Perform middle stage(s) of pipeline  Perform last stage of pipeline and return result April 4, 2012 L15-14http://csg.csail.mit.edu/6.S078

Single/Multi-cycle pipe stage rule doExec;... initialization let done = False; if (! waiting) begin if(isMulticycle(...)) begin multicycle.request(...); waiting <= True; end else begin result = singlecycle.exec(...); done = True; end end else begin result <- multicycle.response(); done = True; waiting <= False; end if (done)...finish up endrule  If not busy start multicycle operation, or…  …perform single cycle operation  If busy, wait for response  Finish up if have a result April 4, 2012 L15-15http://csg.csail.mit.edu/6.S078

Adding a cache (multi-cycle unit!) into the pipeline Architectural state: Which stages need to be modified? InstCache iCache <- mkInstCache(dualMem.iMem); ReqRespMem l2Cache <- mkL2Cache(dualMem.dMem); DataCache dCache <- mkDataCache(l2Cache); April 4, 2012 L15-16http://csg.csail.mit.edu/6.S078

Instruction fetch stage Processor sends request to I$ Processor receives response from I$ if(!inFlight) inFlight <- iCache.req(fetchPc); let instResp <- iCache.resp; if(instResp matches tagged Valid.d) : April 4, 2012 L15-17http://csg.csail.mit.edu/6.S078

Memory read stage Processor sends request to D$ Processor receives response from D$ if(! poisoned && (decInst.i_type==Ld || decInst.i_type==St) && ! memInFlight && mr.notFull) inFlight <- dCache.req(MemReq{op:execInst.itype==Ld ? Ld : St, addr:execInst.addr, data:execInst.data}); let data <- dCache.resp; if(! poisoned && (decInst.i_type == Ld || decInst.i_type == St )) if(isValid(data)) April 4, 2012 L15-18http://csg.csail.mit.edu/6.S078

Put it all together: Pipeline+Caches Instruction Fetch Hit F Fetch D Decode R Reg Read X Execute M Memory W Write- back L1 cache Processor DRAM ReqRespMem req accepted resp L2 cache req accepted resp April 4, 2012 L15-19http://csg.csail.mit.edu/6.S078

Put it all together: Pipeline+Caches Instruction Fetch Miss F Fetch D Decode R Reg Read X Execute M Memory W Write- back L1 cache Processor DRAM ReqRespMem req accepted resp L2 cache req accepted resp April 4, 2012 L15-20http://csg.csail.mit.edu/6.S078

Put it all together: Pipeline+Caches Instruction Fetch not accepted F Fetch D Decode R Reg Read X Execute M Memory W Write- back L1 cache Processor DRAM ReqRespMem req accepted resp L2 cache req accepted resp April 4, 2012 L15-21http://csg.csail.mit.edu/6.S078

Put it all together: Pipeline+Caches Multiple instructions with non-blocking caches L1 cache Processor DRAM ReqRespMem req accepted resp L2 cache req accepted resp F Fetch D Decode R Reg Read X Execute M Memory W Write- back missFifo April 4, 2012 L15-22http://csg.csail.mit.edu/6.S078

Where are the caches? April 4, 2012 L15-23http://csg.csail.mit.edu/6.S078