경종민 1 System Functionality Verification using FPGA.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Interfacing mixed signal peripherals by protocols of packet type Emil Gueorguiev Saramov Angel Nikolaev Popov Computer Systems Department, Technical University.
ECE 506 Reconfigurable Computing ece. arizona
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
FPGA-Based System Design: Chapter 7 Copyright  2004 Prentice Hall PTR Topics n Bus interfaces. n Platform FPGAs.
Survey of Reconfigurable Logic Technologies
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
EECE579: Digital Design Flows
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Configurable System-on-Chip: Xilinx EDK
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Programmable logic and FPGA
Lecture 3 1 ECE 412: Microcomputer Laboratory Lecture 3: Introduction to FPGAs.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Chapter 6 Memory and Programmable Logic Devices
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
CSET 4650 Field Programmable Logic Devices
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 7 Programmable.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #6 – Modern.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
© 2007 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Hardware Design INF3430 MicroBlaze 7.1.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
EE3A1 Computer Hardware and Digital Design
Computer Organization & Assembly Language © by DR. M. Amer.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Chapter 4 MARIE: An Introduction to a Simple Computer.
CSET 4650 Field Programmable Logic Devices
Introduction to Microprocessors
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Survey of Reconfigurable Logic Technologies
Delivered by.. Love Jain p08ec907. Design Styles  Full-custom  Cell-based  Gate array  Programmable logic Field programmable gate array (FPGA)
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
System on a Programmable Chip (System on a Reprogrammable Chip)
This chapter in the book includes: Objectives Study Guide
Issues in FPGA Technologies
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Instructor: Dr. Phillip Jones
Electronics for Physicists
This chapter in the book includes: Objectives Study Guide
System Interconnect Fabric
We will be studying the architecture of XC3000.
The Xilinx Virtex Series FPGA
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family)
The Xilinx Virtex Series FPGA
Electronics for Physicists
Topics Bus interfaces. Platform FPGAs..
Programmable logic and FPGA
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

경종민 1 System Functionality Verification using FPGA

2 Contents Section I –Introduction to reconfigurable computing –FPGA Logic/Routing architecture Section II –Core-embedded FPGA –ALTERA/XILINX/TRISCEND/SiDSA Section III –Multiple-FPGA architecture –Emulation/Simulation acceleration using FPGA ’ s

3 Introduction Design execution methodology –Hardware Very fast & efficient No alteration after fabrication Expensive process to redesign and refabrication –Software-programmed processors Set of instructions determines a specific operation. Functionality can be easily changed. Performance is far below that of an ASIC.

4 Reconfigurable Computing Fill the gap between hardware and software –FPGA is an array of computational elements and the routing wires among them. –The configuration is determined by programmable configuration bits. Development –1963 : Concept of “ restructurable computing ” appeared. –1980 ’ s : FPGA technology developed as a hybrid device between PALs and MPGA(Mask Programmable Gate Arrays) by Xilinx, Altera, Lucent, QuickLogic.. –SRAM-programmable FPGA : high density –1999-Now : Core-embedded FPGA incorporates both of programmable processor and FPGA.

5 Logic Block LUT-based logic block –Efficient logic block architecture adopted in many commercial FPGA ’ s –Composed of LUT, DFF(Latch), and mux carry logic 4-LUT DFF CoutCin I1I2I3I4 Out

6 Logic Block 4-LUT –Any function with 4 input variables can be implemented. FF –Used for pipelining, registers, –It can be configured for latch by configuration –Clock signals come from global signals routed on special resources (Global net) Carry logic –Speed up the carry-based arithmetic functions –Bypass the routing resources but connected directly to the neighboring CLB

7 Interconnection Architecture Island-style FPGA routing architecture –Routing architecture of most FPGA architectures –Sea of routing resources for connection between rows and columns of logic blocks –Connection blocks : Programmable multiplexers that selects the signals in the given routing channel to be connected to the logic block ’ s terminal. –Switch Box: Connections between horizontal and vertical routing resources

8 Interconnection Architecture island-style routing architecture

9 Interconnection Architecture Routing resources with various lengths –Local interconnections : Routing between logical blocks (ex. dedicated carry chain) –Medium length lines : Routing wire that runs width of several logical blocks –Long lines : Routing wire that runs the whole chip height or width –Global lines : Routing wire that runs the entire area of the chip High-speed, low-skew, connections to all logic blocks Usually used for clocks, resets.

10 Two Routing Architectures Segmented routing architecture –Local communication traffic by short wires –Long wires are frequently used to travel long distances without passing through many switches –Researches How many wires should be contained in each channel? How many types of long wires would be efficient? Proper portion of each wire type in the whole routing resources –Companies : Xilinx, Lucent, Vantis

11 Two Routing Architectures Hierarchical routing architecture –Cluster-based routing architecture Routing within a cluster is at the local level, only connecting within that cluster. Longer wires connect different clusters together. –Each routing level contains several clusters –Background Most connections between logic blocks are local with only a limited amount of communication traversing long distance –Good placement algorithm is required. –Company : ALTERA

12 Two Routing Architectures Segmented RoutingHierarchical Routing Logic blocks Connection switches cluster

13 Heterogeneous architecture Multiplier embedding –Multiplier implementation in FPGA is usually inefficient. –Custom/Configurable hardware for multiplication with various operand widths and choice of signed/unsigned can be embedded using a reconfigurable array of FAB ’ s (special full adder blocks). –(Haynes, Field-Programmable Custom Computing Machines, 1998)

14

15 Heterogeneous architecture Embedded memory blocks –Use of available LUTs as RAM structure (Xilinx XC4000, Virtex FPGAs) –Dedicated memory blocks within array (Xilinx Virtex and Altera FPGAs)

16 Xilinx Virtex architecture Block SelectRAM is embedded inside logic blocks as a column.

17 Heterogeneous Architecture Processor embedding –At late 2000, several commercial FPGA companies have announced plans to include entire microprocessors. –Altera ARM9-based Excalibur device –Xilinx PowerPC based Virtex-II device –Triscend 8051/ARM based SoC integration platform

경종민 18 SoC Verification through FPGA ’ s Core-Embedded FPGA

19 Core-Embedded FPGA ’ s ALTERA –Excalibur TM ARM-embedded FPGA –Stratix TM Currently without ARM core. Excalibur ’ s next version is under development. XILINX –Virtex-II Pro TM IBM ’ s PowerPC-embedded FPGA. Triscend –A7 ARM-embedded FPGA –E embedded FPGA

20 ALTERA ’ s Excalibur ARM9 core integrated with FPGA –“ SOPC (System On Programmable Chip) ” –C/C++ compiler/debugger integrated in the FPGA compiler. Interface between processor and FPGA –AMBA (Advanced Microcontroller Bus Architecture) –The widely used internal bus architecture for SoC. –The connection between ARM processor and FPGA block is done by AMBA bus.

21 ALTERA ’ s Excalibur Clock Domain 2(AHB2) (up to 90MHz) Clock Domain 3 (PLD) (up to 100MHz) Clock Domain 1 (AHB1) (up to 180MHz)

22 Clock Domain 2(AHB2) (up to 90MHz) Clock Domain 3 (PLD) (up to 100MHz) Clock Domain 1 (AHB1) (up to 180MHz)

23 ALTERA ’ s Excalibur AHB1 –Bridge for AHB2 –Interrupt controller, watchdog timer –Single Port & Dual Port SRAM –The Embedded processor is the only bus master on AHB1

24 ALTERA ’ s Excalibur AHB2 –PLD transfers data with memories, UART or PLD slave –Dedicated interfaces between stripe (Processor and Peripherals) and PLD

25 AHB2 –PLD transfers data with memories, UART or PLD slave –Dedicated interfaces between stripe (Processor and Peripherals) and PLD

26 XILINX ’ s Virtex-II Pro PowerPC core integrated with FPGA –“ Platform FPGA architecture ” –Up to four PPC cores can be integrated. Interface between processor and FPGA –CoreConnect Bus PLB (Processor Local Bus) DCR (Device Control Register) bus –OCM(On-Chip Memory) interface Dedicated interface between the block RAM and OCM signals of PPC core.

27 Virtex-II Pro Block Diagram PowerPC core. This block diagram contains two PPC cores. Block RAM and multiplier blocks Configurable logic block array

28 PPC Core Block PPC 405 Core OCM controller Control Block RAM OCM controller is dedicated interface between PPC and Block RAM. Block RAM can be configured as Instruction-Side Block RAM(ISBRAM) or Data-Side Block RAM(DSBRAM). Fixed latency of memory access guarantees higher speed execution. Block RAM can be configured as dual-port RAM (Data communication between PPC and FPGA). PLB master interface ports are at the boundary of PPC core. DCR bus

29 Triscend ’ s E5/A7 E5/A7 –“ CSoC(Configurable System-on-Chip) ” –E5 contains 8051 core, CSL(Configurable System Logic) matrix, and peripheral interfaces(JTAG, DMA, Timer, FIFO) –A7 contains ARM core instead of CSI (Configurable System Interconnect) –Bus developed by Triscend. –Pipelined bus architecture for the performance optimization

30 Triscend E5/A7 Bus architecture allows the bus to be expanded throughout the whole chip while preserving high- performance. –The internal system bus is extended throughout the user- configurable system logic. Objectives –Inclusion of any processor is possible. –High-performance assured regardless of the CSL size

31 Triscend ’ s A7 Architecture CSI Bus –Configurable System Interconnect –Masters of CSI ARM JTAG(Configuration) DMA0, DMA1, DMA2, DMA3 –Sideband Signals Dedicated small # of signals for UART, Timer

32 Triscend ’ s CSL matrix Vertical/Horizontal Breakers 1.Vertical : Address Decoder part of CSI 2.Horizontal : Data read/write port of CSI Vertical/Horizontal Breakers 1.Vertical : Address Decoder part of CSI 2.Horizontal : Data read/write port of CSI Selector 1.Decodes address 2.Registers are arranged in vertical column of CSL cells 3.Pre-programmed at the initialization Selector 1.Decodes address 2.Registers are arranged in vertical column of CSL cells 3.Pre-programmed at the initialization

33 Triscend ’ s System Architecture CPU DMA JTAG Bus FIFO/ Arbiter for multiple Masters Bus FIFO/ Arbiter for multiple Masters CSL RAM ROM Memory Interface Bus master requires grant signals from arbiter CPU runs boot code initially. Boot code is for configuring CSL as well as storing program/data.

34 CSI Bus Architecture Bus FIFO Master Write – Address/Data/ControlSlave Write – Address/Data/Control Master Arbiter Master Read – Data/Control Selectors and pipe registers Slave Read – Data/Control Dedicated Slave CSL Arbiter

35 Pipelined Write Transaction Bus FIFO Master Write – Address/Data/ControlSlave Write – Address/Data/Control Master Arbiter Master Read – Data/Control Selectors and pipe registers Slave Read – Data/Control Dedicated Slave CSL Time Slot T+1 Time Slot T+2 Arbiter Time Slot T

36 Pipelined Read Transaction Bus FIFO Master Write – Address/Data/ControlSlave Write – Address/Data/Control Master Arbiter Master Read – Data/Control Selectors and pipe registers Slave Read – Data/Control Dedicated Slave CSL Time Slot T+1 Time Slot T+2 Time Slot T+3 Arbiter Time Slot T

37 Pipeline in view of Bus Logic master arbiter Address/ Data Address/ Data Configure Selector Decode Configure Selector Decode Read from CSL Bus FIFO Data from CSL to Master T T+1T+2T+3

38 Wait State Why is it generated? –1. The handshake operation inside the logic implemented in CSL. –2. CSL logic is too slow to respond in one cycle. Sequence of generation –1. “ Address Selector ” in CSL generates wait state if the system tries to access the Selector ’ s address. –2. If more than one wait state is required, the CSL function inserts additional wait states.

39 Wait State Insertion master arbiter Address/ Data Address/ Data Configure Selector Decode Configure Selector Decode Read from CSL Bus fifo Data from CSL to Master T T+1T+2T+3 OR Waitnow

40 CSL Physical Structure Bus pipeline registers at each bank boundary  Time slots for user logic is independent of the signal transport time between banks. The write/read bus is distributed throughout CSL and buffered and piped into the bank as shown by the red arrows. 16x8 RAM System Logic 8K RAM 16x8 RAM Bank Logic tile The wait signals generated from each bank is propagated to the pipeline registers in all other banks. Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell

41 Structure Bank/Bus/Selector Tile Selector Bank Horizontal data line writes data to CSL cell. The read data is OR’ed to the horizontal read data line. 4 wires each tile Configured initially for the selection of the column/wait generation.

42 E5 Physical Implementation 8051 CPU core 0.35um, 40MHz CSL operation 8051 CPU core and RAM/ROM CSL matrix

43 SiDSA ’ s FIPSOC Integration of CAB (Configurable Analog Block) –8051 microcontroller –FPGA –Configurable analog cells optimized for data acquisition applications Dynamic reconfiguration –Two configuration bits for each CLB –User can download extra configuration data while the cells are in operation.

44 Analog Subsystem Configurable Analog Blocks (CAB) –Differential amplification –Comparison –Data conversion (ADC, DAC) Digital part –Digital part to configure CAB is controlled by the  P or the programmable logic.

45 Comparison Xilinx –Using CoreConnect bus to connect processor and FPGA. –Multiple processor cores can be used simultaneously. ALTERA –AMBA bus to connect processor and FPGA. Triscend –Processor can read/write any register inside of CSL matrix. (CSL matrix can be considered as a functional block of the processor) –Intensive pipeline schemes adopted to maintain/increase the throughput, as the latency otherwise caused by the distributed bus throughout the CSL matrix can be excessive.