Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Programmable Logic Devices
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Introduction to CMOS VLSI Design Lecture 13: SRAM
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Programmable logic and FPGA
Lecture 19: SRAM.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Parts from Lecture 9: SRAM Parts from
1. 2 FPGAs Historically, FPGA architectures and companies began around the same time as CPLDs FPGAs are closer to “programmable ASICs” -- large emphasis.
Cache Organization of Pentium
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Case Study - SRAM & Caches
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Advanced VLSI Design Unit 06: SRAM
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
1 TPUTCACHE: HIGH-FREQUENCY, MULTI-WAY CACHE FOR HIGH- THROUGHPUT FPGA APPLICATIONS Aaron Severance University of British Columbia Advised by Guy Lemieux.
EE3A1 Computer Hardware and Digital Design
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
EE121 John Wakerly Lecture #15
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
CMSC 611: Advanced Computer Architecture
Topics Coarse-grained FPGAs. Reconfigurable systems.
Cache Organization of Pentium
Memory COMPUTER ARCHITECTURE
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Cache Memory Presentation I
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
COMP4211 : Advance Computer Architecture
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Ka-Ming Keung Swamy D Ponpandi
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Overcoming main memory size limitation
Lecture 22: Cache Hierarchies, Memory
Lecture 26 Logic BIST Architectures
Lecture 24: Virtual Memory, Multiprocessors
Lecture 8: Efficient Address Translation
The University of Adelaide, School of Computer Science
Ka-Ming Keung Swamy D Ponpandi
Presentation transcript:

Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block Presentation Date : 2004/12/30 By : Ken Mai, Ron Ho, Elad Alon, Dean Liu,Younggon Kim, Dinesh Patil, and Mark Horowitz

High design complexity High non-recurring engineering costs Need high-volume, high-profit market Hard to modify or fix Custom ASICs are expensive

Growing interest in reconfigurable solutions – FPGAs – Structured ASICs – Coarse-grain architectures Reconfigurable computing characteristics – Low non-recurring engineering costs – Good performance and efficiency – Reconfigurability overheads Reconfigurable computing

FPGA with hardwired blocks CLBs CLB : Configurab44le Logic Block [1]

Coarse-grain architecture Chip multi-processor Compute, memory, interconnect, control Reconfigure tile and global network [1]

Traditional emphasis on compute side – Memory system important FPGAs – Fine grain with sizable overheads – Use CLBs for extra functionality – Slow compared to cutting-edge SRAMs Coarse-grain architectures – Large grain – Low flexibility Current memory systems

Low overhead, fast, reconfigurable memory system Reconfigure along natural SRAM partition boundaries – Add hardwired blocks for extra functionality Modern SRAM circuit techniques – Pulse-mode self-resetting logic – Replica timing paths Design targets – Cache – FIFO Design goal

Reconfigurable memory system Array of homogeneous memory mats Each memory has a port into the interconnect Mat size chosen based on natural partition boundary Small inter-mat control network [1]

Smart Memories chip [2]

Tile floor plan [2]

Sample configuration: caches Mats configured as tag or data Direct mapped or set-associative caches Use inter-mat control network to pass hit/miss [1]

Mats configured as 2-way set-associative cache [2]

Sample configuration: FIFOs Data FIFOs, instruction store, and scratchpad Completely self-contained FIFOs A single FIFO can be <> 1 mat [1]

Multi-porting Some configurations need >1 access per cycle – Cache tag with snooping – FIFOs with independent read and write ports Multi-porting each cell is expensive – Multiple ports not always needed Run memory system faster than processor – Time multiplex single-port – Memory cycle = 10 fan-out of 4 inverter delays

Virtual multi-porting [1]

Mat latency Total mat latency = 2 cycles – 20 FO4 – SRAM access = 1 cycle – Peripheral logic = 1 cycle Fully pipelined – Accepts one access every cycle

Added features [1]

Meta-data [1]

Mat details 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data – Scan tunable replica bitline

Meta-data bits Cache: valid, dirty, LRU, cache coherence state FIFO: valid Special operations – Gang – Read modify write meta-datadata 4b 32b

Gang operation Can gang set or clear columns of meta-data bits Single cycle operation mask clearset meta-datadata [1]

Gang operation Can gang set or clear columns of meta-data bits Single cycle operation mask clearset meta-datadata [1]

Meta-data bit cell [1]

Meta-data bit cell [1]

Read modify write mdatadata [1]

Read modify write: read mdatadata [1]

Read modify write: modify mdatadata [1]

Read modify write: write mdatadata [1]

RMW decoder circuits [1]

RMW decoder circuits [1]

RMW decoder circuits: read [1]

RMW decoder circuits: modify [1]

RMW decoder circuits: write [1]

RMW decoder circuits: write [1]

Timing [1]

PLA Reconfigurable NOR-NOR PLA 1st NOR plane = ternary-CAM 2nd NOR plane = SRAM [1]

PLA: 1st NOR plane [1]

PLA: normal delay chain [1]

PLA: early reset-off delay chain [1]

PLA: 2nd NOR plane [1]

Pointer logic [1]

Pointer logic Pointer cells are 2-ported For FIFO configurations we add pointer logic 4 pointer/stride pairs – 11b pointer – 4b stride [1]

Write buffer [1]

Write buffer Pipeline writes for single-cycle cache writes On write, data mat stores incoming data in WB Tag check – Cache miss WB entry is invalidated – Cache hit WB entry is allowed to write Writes into data mat on next write On every write, the WB and mat are both active [3]

Comparator [1]

Comparator Maskable comparator – Can mask out any combination of meta-data bits – Can mask out the main data as a chunk Example use: cache tag compare – Want to check valid state of line (in meta-data) – Want to check tag itself (in main data)

Putting it all together [1]

Testchip 0.18µm 6M TSMC 3mm x 3.3mm die 4 memory blocks Low swing crossbar Test vector storage 1.1GHz (10 FO4) 1.8V, room temp. [1]

Testchip mat details 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data 16 AND-term PLA – 6 inputs, 4 outputs 4 pointer/stride pairs – 11b pointer – 4b stride

Mat area breakdown (mm2) 32% mat area in peripheral logic [1]

Mat power breakdown (mW) 26% power in peripheral logic [1]

Conclusions Reconfigurable memory block – Multiple memory configurations – Performance on par with modern SRAMs – Modest overheads Future uses – Reconfigurable computing – General purpose computing – Designs with shifting memory requirements

Refrence [1] K.Mai et al., “Architecture and Circuit Techniques for a Reconfigurable Memory Block” ISSCC 2004 [2] K. Mai et al., “Smart Memories: a Modular, Reconfigurable Architecture,” Intl. Symp. on Comp. Arch., pp , [3] J. Hennessy and D. Patterson, “Computer Architecture a Quantitative Approach,” 2nd Ed., 1996.