Memristor memory on its way (hopefully)

Slides:



Advertisements
Similar presentations
CS 430 – Computer Architecture
Advertisements

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic
CS61C L32 Caches II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #20: Caches Andy.
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
CS61C L31 Caches II (1) Garcia, Fall 2006 © UCB GPUs >> CPUs?  Many are using graphics processing units on graphics cards for high-performance computing.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Lecturer PSOE Dan Garcia
CS61C L23 Cache II (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 – Cache II.
CS 61C L20 Caches I (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L23 Caches I (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I.
CS61C L21 Caches I (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 – Caches II In this week’s Science, IBM researchers describe a new class.
CS61C L33 Caches III (1) Garcia, Spring 2007 © UCB Future of movies is 3D?  Dreamworks says they may exclusively release movies in this format. It’s based.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
CS61C L32 Caches II (1) Garcia, 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L31 Caches I (1) Garcia 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L20 Caches I (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #20: Caches Andy Carle.
CS61C L32 Caches II (1) Garcia, Spring 2007 © UCB Experts weigh in on Quantum CPU  Most “profoundly skeptical” of the demo. D-Wave has provided almost.
CS61C L30 Caches I (1) Garcia, Fall 2006 © UCB Shuttle can’t fly over Jan 1?  A computer bug has come up for the shuttle – its computers don’t reset to.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II
CS61C L24 Cache II (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II.
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
CS61C L31 Caches I (1) Garcia, Spring 2007 © UCB Powerpoint bad!!  Research done at the Univ of NSW says that “working memory”, the brain part providing.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
CS 61C L21 Caches II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L37 VM III (1)Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Cache Memory CSE Slides from Dan Garcia, UCB.
CMPE 421 Parallel Computer Architecture
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I After more than 4 years C is back at position number 1 in.
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic
Lecturer PSOE Dan Garcia
The Goal: illusion of large, fast, cheap memory
Stateless Combinational Logic and State Circuits
How will execution time grow with SIZE?
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
State Circuits : Circuits that Remember
William Stallings Computer Organization and Architecture 7th Edition
Instructor Paul Pearce
CS61C : Machine Structures Lecture 6. 2
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
CS61C : Machine Structures Lecture 6. 2
Krste Asanović & Randy H. Katz
Cache Memory.
Inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 23 – Representations of Combinatorial Logic Circuits TA/Punner.
Lecturer SOE Dan Garcia
en.wikipedia.org/wiki/Conway%27s_Game_of_Life
Lecturer PSOE Dan Garcia
ECE232: Hardware Organization and Design
How can we find data in the cache?
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecturer PSOE Dan Garcia
TA David “The Punner” Eitan Poll
Lecturer SOE Dan Garcia
Some of the slides are adopted from David Patterson (UCB)
Albert Chae, Instructor
Lecturer PSOE Dan Garcia
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory & Cache.
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Memristor memory on its way (hopefully) inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 13 – Caches II 2014-02-21 Guest Lecturer Alan Christopher Memristor memory on its way (hopefully) HP has begun testing research prototypes of a novel non-volatile memory element, the memristor. They have double the storage density of flash, and has 10x more read-write cycles than flash (106 vs 105). Memristors are (in principle) also capable of being memory and logic, how cool is that? Originally slated to be ready by 2013, HP later pushed that date to some time 2014. www.technologyreview.com/computing/25018 http://www.technologyreview.com/view/512361/can-hp-save-itself/

Review: New-School Machine Structures Software Hardware Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions >1 instruction @ one time e.g., 5 pipelined instructions Parallel Data >1 data item @ one time e.g., Add of 4 pairs of words Hardware descriptions All gates @ one time Programming Languages Warehouse Scale Computer Smart Phone Harness Parallelism & Achieve High Performance Core … Memory (Cache) Input/Output Computer Cache Memory Core Instruction Unit(s) Functional Unit(s) A3+B3 A2+B2 A1+B1 A0+B0 Today’s Lecture Logic Gates 11/9/2018 Fall 2012 -- Lecture #14

Review: Direct-Mapped Cache All fields are read as unsigned integers. Index specifies the cache index (or “row”/block) Tag distinguishes betw the addresses that map to the same location Offset specifies which byte within the block we want tttttttttttttttttt iiiiiiiiii oooo tag index byte to check to offset if have select within correct block block block

TIO Dan’s great cache mnemonic AREA (cache size, B) = HEIGHT (# of blocks) * WIDTH (size of one block, B/block) 2(H+W) = 2H * 2W WIDTH (size of one block, B/block) Tag Index Offset Addr size (often 32 bits) AREA (cache size, B) HEIGHT (# of blocks)

Memory Access without Cache Load word instruction: lw $t0, 0($t1) $t1 contains 1022ten, Memory[1022] = 99 Processor issues address 1022ten to Memory Memory reads word at address 1022ten (99) Memory sends 99 to Processor Processor loads 99 into register $t1

Memory Access with Cache Load word instruction: lw $t0, 0($t1) $t1 contains 1022ten, Memory[1022] = 99 With cache (similar to a hash) Processor issues address 1022ten to Cache Cache checks to see if has copy of data at address 1022ten 2a. If finds a match (Hit): cache reads 99, sends to processor 2b. No match (Miss): cache sends address 1022 to Memory Memory reads 99 at address 1022ten Memory sends 99 to Cache Cache replaces word with new 99 Cache sends 99 to processor Processor loads 99 into register $t1

Caching Terminology When reading memory, 3 things can happen: cache hit: cache block is valid and contains proper address, so read desired word cache miss: nothing in cache in appropriate block, so fetch from memory cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy)

Cache Terms Hit rate: fraction of access that hit in the cache Miss rate: 1 – Hit rate Miss penalty: time to replace a block from lower level in memory hierarchy to cache Hit time: time to access cache memory (including tag comparison) Abbreviation: “$” = cache (A Berkeley innovation!)

Accessing data in a direct mapped cache Memory Ex.: 16KB of data, direct-mapped, 4 word blocks Can you work out height, width, area? Read 4 addresses 0x00000014 0x0000001C 0x00000034 0x00008014 Memory vals here: Address (hex) Value of Word 00000010 00000014 00000018 0000001C a b c d ... 00000030 00000034 00000038 0000003C e f g h 00008010 00008014 00008018 0000801C i j k l

Accessing data in a direct mapped cache 4 Addresses: 0x00000014, 0x0000001C, 0x00000034, 0x00008014 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields 000000000000000000 0000000001 0100 000000000000000000 0000000001 1100 000000000000000000 0000000011 0100 000000000000000010 0000000001 0100 Tag Index Offset

16 KB Direct Mapped Cache, 16B blocks Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries invalid) ... Valid Tag 0xc-f 0x8-b 0x4-7 0x0-3 1 2 3 4 5 6 7 1022 1023 Index

1. Read 0x00000014 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index

So we read block 1 (0000000001) 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 Index 0xc-f 0x8-b 0x4-7 0x0-3

No valid data 000000000000000000 0000000001 0100 Tag field Index field 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index

So load that data into cache, setting tag, valid 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

Read from cache at offset, return word b 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

2. Read 0x0000001C = 0…00 0..001 1100 000000000000000000 0000000001 1100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

Index is Valid 000000000000000000 0000000001 1100 Tag field 000000000000000000 0000000001 1100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

Index valid, Tag Matches 000000000000000000 0000000001 1100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

Index Valid, Tag Matches, return d 000000000000000000 0000000001 1100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

3. Read 0x00000034 = 0…00 0..011 0100 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

So read block 3 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

No valid data 000000000000000000 0000000011 0100 Tag field Index field 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a

Load that cache block, return word f 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a 1 h g f e

4. Read 0x00008014 = 0…10 0..001 0100 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a 1 h g f e

So read Cache Block 1, Data is Valid 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a 1 h g f e

Cache Block 1 Tag does not match (0 != 2) 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 d c b a 1 h g f e

Miss, so replace block 1 with new data & tag 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 2 l k j i 1 h g f e

And return word J 000000000000000010 0000000001 0100 Tag field 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 1 2 3 4 5 6 7 1022 1023 0xc-f 0x8-b 0x4-7 0x0-3 Index 1 2 l k j i 1 h g f e

Do an example yourself. What happens? Chose from: Cache: Hit, Miss, Miss w. replace Values returned: a ,b, c, d, e, ..., k, l Read address 0x00000030 ? 000000000000000000 0000000011 0000 Read address 0x0000001c ? 000000000000000000 0000000001 1100 Cache Valid 0xc-f 0x8-b 0x4-7 0x0-3 Index Tag 1 1 2 l k j i 2 3 1 h g f e 4 5 6 7 ... ...

Answers Memory 0x00000030 a hit 0x0000001c a miss 00000010 00000014 Index = 3, Tag matches, Offset = 0, value = e 0x0000001c a miss Index = 1, Tag mismatch, so replace from memory, Offset = 0xc, value = d Since reads, values must = memory values whether or not cached: 0x00000030 = e 0x0000001c = d Memory Address (hex) Value of Word 00000010 00000014 00000018 0000001C a b c d ... 00000030 00000034 00000038 0000003C e f g h 00008010 00008014 00008018 0000801C i j k l

Administrivia Proj 1-2 due Sunday

Multiword-Block Direct-Mapped Cache Four words/block, cache size = 4K words 31 30 . . . 15 14 13 . . . 4 3 2 1 0 Byte offset Hit AND “MUX” 41 Multiplexor Data 32 Block offset 2 18 Tag 10 Index Data Index Tag Valid 1 2 . 1021 1022 1023 18 to take advantage for spatial locality want a cache block that is larger than word word in size. What kind of locality are we taking advantage of?

Peer Instruction 1) Mem hierarchies were invented before 1950. (UNIVAC I wasn’t delivered ‘til 1951) 2) All caches take advantage of spatial locality. 3) All caches take advantage of temporal locality. 123 a) FFF a) FFT b) FTF b) FTT c) TFF d) TFT e) TTF e) TTT Answer: [correct=5, TFF] Individual (6, 3, 6, 3, 34, 37, 7, 4)% & Team (9, 4, 4, 4, 39, 34, 0, 7)% A: T [18/82 & 25/75] (The game breaks up very quickly into separate, independent games -- big win to solve separated & put together) B: F [80/20 & 86/14] (The game tree grows exponentially! A better static evaluator can make all the difference!) C: F [53/57 & 52/48] (If you're just looking for the best move, just keep theBestMove around [ala memoizing max])

And in Conclusion… Mechanism for transparent movement of data among levels of a storage hierarchy set of address/value bindings address  index to set of candidates compare desired address with tag service hit or miss load new block and binding on miss Valid Tag 0xc-f 0x8-b 0x4-7 0x0-3 1 2 3 ... d c b a 000000000000000000 0000000001 1100 address: tag index offset