Download presentation
Presentation is loading. Please wait.
Published byMilton Palfrey Modified over 9 years ago
1
Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block Presentation Date : 2004/12/30 By : Ken Mai, Ron Ho, Elad Alon, Dean Liu,Younggon Kim, Dinesh Patil, and Mark Horowitz
2
High design complexity High non-recurring engineering costs Need high-volume, high-profit market Hard to modify or fix Custom ASICs are expensive
3
Growing interest in reconfigurable solutions – FPGAs – Structured ASICs – Coarse-grain architectures Reconfigurable computing characteristics – Low non-recurring engineering costs – Good performance and efficiency – Reconfigurability overheads Reconfigurable computing
4
FPGA with hardwired blocks CLBs CLB : Configurab44le Logic Block [1]
5
Coarse-grain architecture Chip multi-processor Compute, memory, interconnect, control Reconfigure tile and global network [1]
6
Traditional emphasis on compute side – Memory system important FPGAs – Fine grain with sizable overheads – Use CLBs for extra functionality – Slow compared to cutting-edge SRAMs Coarse-grain architectures – Large grain – Low flexibility Current memory systems
7
Low overhead, fast, reconfigurable memory system Reconfigure along natural SRAM partition boundaries – Add hardwired blocks for extra functionality Modern SRAM circuit techniques – Pulse-mode self-resetting logic – Replica timing paths Design targets – Cache – FIFO Design goal
8
Reconfigurable memory system Array of homogeneous memory mats Each memory has a port into the interconnect Mat size chosen based on natural partition boundary Small inter-mat control network [1]
9
Smart Memories chip [2]
10
Tile floor plan [2]
11
Sample configuration: caches Mats configured as tag or data Direct mapped or set-associative caches Use inter-mat control network to pass hit/miss [1]
12
Mats configured as 2-way set-associative cache [2]
13
Sample configuration: FIFOs Data FIFOs, instruction store, and scratchpad Completely self-contained FIFOs A single FIFO can be <> 1 mat [1]
14
Multi-porting Some configurations need >1 access per cycle – Cache tag with snooping – FIFOs with independent read and write ports Multi-porting each cell is expensive – Multiple ports not always needed Run memory system faster than processor – Time multiplex single-port – Memory cycle = 10 fan-out of 4 inverter delays
15
Virtual multi-porting [1]
16
Mat latency Total mat latency = 2 cycles – 20 FO4 – SRAM access = 1 cycle – Peripheral logic = 1 cycle Fully pipelined – Accepts one access every cycle
17
Added features [1]
18
Meta-data [1]
19
Mat details 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data – Scan tunable replica bitline
20
Meta-data bits Cache: valid, dirty, LRU, cache coherence state FIFO: valid Special operations – Gang – Read modify write meta-datadata 4b 32b
21
Gang operation Can gang set or clear columns of meta-data bits Single cycle operation mask clearset meta-datadata [1]
22
Gang operation Can gang set or clear columns of meta-data bits Single cycle operation mask clearset meta-datadata [1]
23
Meta-data bit cell [1]
24
Meta-data bit cell [1]
25
Read modify write mdatadata [1]
26
Read modify write: read mdatadata [1]
27
Read modify write: modify mdatadata [1]
28
Read modify write: write mdatadata [1]
29
RMW decoder circuits [1]
30
RMW decoder circuits [1]
31
RMW decoder circuits: read [1]
32
RMW decoder circuits: modify [1]
33
RMW decoder circuits: write [1]
34
RMW decoder circuits: write [1]
35
Timing [1]
36
PLA Reconfigurable NOR-NOR PLA 1st NOR plane = ternary-CAM 2nd NOR plane = SRAM [1]
37
PLA: 1st NOR plane [1]
38
PLA: normal delay chain [1]
39
PLA: early reset-off delay chain [1]
40
PLA: 2nd NOR plane [1]
41
Pointer logic [1]
42
Pointer logic Pointer cells are 2-ported For FIFO configurations we add pointer logic 4 pointer/stride pairs – 11b pointer – 4b stride [1]
43
Write buffer [1]
44
Write buffer Pipeline writes for single-cycle cache writes On write, data mat stores incoming data in WB Tag check – Cache miss WB entry is invalidated – Cache hit WB entry is allowed to write Writes into data mat on next write On every write, the WB and mat are both active [3]
45
Comparator [1]
46
Comparator Maskable comparator – Can mask out any combination of meta-data bits – Can mask out the main data as a chunk Example use: cache tag compare – Want to check valid state of line (in meta-data) – Want to check tag itself (in main data)
47
Putting it all together [1]
48
Testchip 0.18µm 6M TSMC 3mm x 3.3mm die 4 memory blocks Low swing crossbar Test vector storage 1.1GHz (10 FO4) 1.8V, room temp. [1]
49
Testchip mat details 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data 16 AND-term PLA – 6 inputs, 4 outputs 4 pointer/stride pairs – 11b pointer – 4b stride
50
Mat area breakdown (mm2) 32% mat area in peripheral logic [1]
51
Mat power breakdown (mW) 26% power in peripheral logic [1]
52
Conclusions Reconfigurable memory block – Multiple memory configurations – Performance on par with modern SRAMs – Modest overheads Future uses – Reconfigurable computing – General purpose computing – Designs with shifting memory requirements
53
Refrence [1] K.Mai et al., “Architecture and Circuit Techniques for a Reconfigurable Memory Block” ISSCC 2004 [2] K. Mai et al., “Smart Memories: a Modular, Reconfigurable Architecture,” Intl. Symp. on Comp. Arch., pp. 161-171, 2000. [3] J. Hennessy and D. Patterson, “Computer Architecture a Quantitative Approach,” 2nd Ed., 1996.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.