Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory.

Similar presentations


Presentation on theme: "Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory."— Presentation transcript:

1 Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block Presentation Date : 2004/12/30 By : Ken Mai, Ron Ho, Elad Alon, Dean Liu,Younggon Kim, Dinesh Patil, and Mark Horowitz

2 High design complexity High non-recurring engineering costs Need high-volume, high-profit market Hard to modify or fix Custom ASICs are expensive

3 Growing interest in reconfigurable solutions – FPGAs – Structured ASICs – Coarse-grain architectures Reconfigurable computing characteristics – Low non-recurring engineering costs – Good performance and efficiency – Reconfigurability overheads Reconfigurable computing

4 FPGA with hardwired blocks CLBs CLB : Configurab44le Logic Block [1]

5 Coarse-grain architecture Chip multi-processor Compute, memory, interconnect, control Reconfigure tile and global network [1]

6 Traditional emphasis on compute side – Memory system important FPGAs – Fine grain with sizable overheads – Use CLBs for extra functionality – Slow compared to cutting-edge SRAMs Coarse-grain architectures – Large grain – Low flexibility Current memory systems

7 Low overhead, fast, reconfigurable memory system Reconfigure along natural SRAM partition boundaries – Add hardwired blocks for extra functionality Modern SRAM circuit techniques – Pulse-mode self-resetting logic – Replica timing paths Design targets – Cache – FIFO Design goal

8 Reconfigurable memory system Array of homogeneous memory mats Each memory has a port into the interconnect Mat size chosen based on natural partition boundary Small inter-mat control network [1]

9 Smart Memories chip [2]

10 Tile floor plan [2]

11 Sample configuration: caches Mats configured as tag or data Direct mapped or set-associative caches Use inter-mat control network to pass hit/miss [1]

12 Mats configured as 2-way set-associative cache [2]

13 Sample configuration: FIFOs Data FIFOs, instruction store, and scratchpad Completely self-contained FIFOs A single FIFO can be <> 1 mat [1]

14 Multi-porting Some configurations need >1 access per cycle – Cache tag with snooping – FIFOs with independent read and write ports Multi-porting each cell is expensive – Multiple ports not always needed Run memory system faster than processor – Time multiplex single-port – Memory cycle = 10 fan-out of 4 inverter delays

15 Virtual multi-porting [1]

16 Mat latency Total mat latency = 2 cycles – 20 FO4 – SRAM access = 1 cycle – Peripheral logic = 1 cycle Fully pipelined – Accepts one access every cycle

17 Added features [1]

18 Meta-data [1]

19 Mat details 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data – Scan tunable replica bitline

20 Meta-data bits Cache: valid, dirty, LRU, cache coherence state FIFO: valid Special operations – Gang – Read modify write meta-datadata 4b 32b

21 Gang operation Can gang set or clear columns of meta-data bits Single cycle operation mask clearset meta-datadata [1]

22 Gang operation Can gang set or clear columns of meta-data bits Single cycle operation mask clearset meta-datadata [1]

23 Meta-data bit cell [1]

24 Meta-data bit cell [1]

25 Read modify write mdatadata [1]

26 Read modify write: read mdatadata [1]

27 Read modify write: modify mdatadata [1]

28 Read modify write: write mdatadata [1]

29 RMW decoder circuits [1]

30 RMW decoder circuits [1]

31 RMW decoder circuits: read [1]

32 RMW decoder circuits: modify [1]

33 RMW decoder circuits: write [1]

34 RMW decoder circuits: write [1]

35 Timing [1]

36 PLA Reconfigurable NOR-NOR PLA 1st NOR plane = ternary-CAM 2nd NOR plane = SRAM [1]

37 PLA: 1st NOR plane [1]

38 PLA: normal delay chain [1]

39 PLA: early reset-off delay chain [1]

40 PLA: 2nd NOR plane [1]

41 Pointer logic [1]

42 Pointer logic Pointer cells are 2-ported For FIFO configurations we add pointer logic 4 pointer/stride pairs – 11b pointer – 4b stride [1]

43 Write buffer [1]

44 Write buffer Pipeline writes for single-cycle cache writes On write, data mat stores incoming data in WB Tag check – Cache miss WB entry is invalidated – Cache hit WB entry is allowed to write Writes into data mat on next write On every write, the WB and mat are both active [3]

45 Comparator [1]

46 Comparator Maskable comparator – Can mask out any combination of meta-data bits – Can mask out the main data as a chunk Example use: cache tag compare – Want to check valid state of line (in meta-data) – Want to check tag itself (in main data)

47 Putting it all together [1]

48 Testchip 0.18µm 6M TSMC 3mm x 3.3mm die 4 memory blocks Low swing crossbar Test vector storage 1.1GHz (10 FO4) 1.8V, room temp. [1]

49 Testchip mat details 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data 16 AND-term PLA – 6 inputs, 4 outputs 4 pointer/stride pairs – 11b pointer – 4b stride

50 Mat area breakdown (mm2) 32% mat area in peripheral logic [1]

51 Mat power breakdown (mW) 26% power in peripheral logic [1]

52 Conclusions Reconfigurable memory block – Multiple memory configurations – Performance on par with modern SRAMs – Modest overheads Future uses – Reconfigurable computing – General purpose computing – Designs with shifting memory requirements

53 Refrence [1] K.Mai et al., “Architecture and Circuit Techniques for a Reconfigurable Memory Block” ISSCC 2004 [2] K. Mai et al., “Smart Memories: a Modular, Reconfigurable Architecture,” Intl. Symp. on Comp. Arch., pp. 161-171, 2000. [3] J. Hennessy and D. Patterson, “Computer Architecture a Quantitative Approach,” 2nd Ed., 1996.


Download ppt "Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory."

Similar presentations


Ads by Google