Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 506 Reconfigurable Computing Lecture 3 Reconfigurable Architectures Ali Akoglu.

Similar presentations


Presentation on theme: "ECE 506 Reconfigurable Computing Lecture 3 Reconfigurable Architectures Ali Akoglu."— Presentation transcript:

1 ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 3 Reconfigurable Architectures Ali Akoglu

2 Complex Programmable Logic Device °Hierarchical design against size explosion of PLAs Combinational logic with Flip Flops (registered output) Organized into logic blocks connected in an interconnect matrix Usually enough logic for simple counters, state machines, decoders, etc.

3 Xilinx CoolRunner II CPLD °PLA and Macrocell combination °1.8V device, estimated power consumption of less than 100 micro amps °Up to 12,000 gates, 512 MacroCells

4 CPLD °Multiple Function Blocks (FBs) and I/O Blocks (IOBs) Fully interconnected (FB outputs and input signals to the FB Inputs) Each FB provides programmable logic 54 inputs,18 outputs. °The IOB provides buffering for device inputs and outputs. °Output enable signals drive directly to the IOBs.

5 Function Block °Comprised of 18 independent macrocells, Each can implement a combinatorial or registered function. °Logic within the FB is implemented using a sum-of- products representation. Fifty-four inputs (108 true and complement signals) into the programmable AND-array to form 90 product terms. Any number of these product terms, can be allocated to each macrocell by the product term allocator. How many product terms would you assign for each Macrocell?

6 Macrocell Product Term Allocator selects: 5 product terms primary data inputs to the OR gate for combinatorial functions, as control inputs (clock, clock enable, set, reset, output en.) configured for a combinatorial or registered function.

7 Product Term Allocator °Controls how the five direct product terms are assigned to each MC. For example, all five direct terms can drive the OR function.

8 Product Term Allocator °Can re-assign other product terms within the FB to increase the logic capacity of a macrocell beyond five direct terms. °Any macrocell requiring additional product terms can access uncommitted product terms in other macrocells within the FB. °Up to 15 product terms can be available to a single macrocell with only a small incremental delay (t PTA )

9 Product Term Allocator

10 °Can re-assign product terms from any macrocell within the FB by combining partial sums of products over several macrocells What is the incremental delay in this example 2t PTA If all 90 product terms are available to any macrocell, what is the maximum incremental delay?

11 Programmability Options °PLDs, CPLDs have different types of programmability. initial programming and reprogramming °One-time programmable: device is programmed once and holds its programming "forever" usually uses fuses to make/break links not reusable, but usually the cheapest discard device if changes are to be made

12 Programmability Options °UV-Erasable (EPROM) a floating gate positioned between regular MOS transistor control gate and the channel. floating gate is uncharged °To program the cell: a high voltage (e.g. 14 volts) applied to the control gate (drain is at ~12 volts). causes current to flow between the source and drain. accelerates electrons to high velocity and a small fraction of them traverse the thin oxide and become trapped on the floating gate. floating gate, surrounded by an insulating layer, becomes “permanently” negatively charged and the transistor is permanently turned off. °“Permanent” means about 10 years at 125 degrees C; at higher temperatures this time is reduced. °Cells erased by Ultra-Violet (UV) light. electrons on floating gates are excited and discharged to the substrate.

13 Programmability Options °Electrically Erasable (EEPROM) uses a floating gate structure with a control gate on top. both erasing and reprogramming is accomplished with an electrical current device can be programmed/erased on circuit board, no special packaging or IC socket is needed erase time is much faster than UV erase programming retained after power down -non-volatile programming/erasing limited to 1000s of cycles

14 Programmability Options °Electrically Erasable: both erasing and reprogramming is accomplished with an electrical current device can be programmed/erased on circuit board, no special packaging or IC socket is needed erase time is much faster than UV erase programming retained after power down -non-volatile programming/erasing limited to 1000s of cycles

15 Electrically Erasable PLDs °Conventional PLDs are either One-time programmable UV Erasable °Must be placed in a programmer to program them °EE PLDs can be programmed and erased in place A small (four wire) connection to a computer is needed Once programmed, will retain program indefinitely Never have to take the chip out of its circuit

16 FPGA °Introduced in 1985 by Xilinx °Similar to CPLDs °A function to be implemented in FPGA Partitioned into modules, each implemented in a logic block. Logic blocks connected with the programmable interconnection.

17 FPGA Technology °1) Antifuse-based Realization of interconnections °2) Memory-based. realization of interconnections and computation FLASH, SRAM

18 FPGA Technology ° Antifuse FPGAs: configured by burning a set of fuses. once configured, cannot be altered any more bug fixes and updates possible for new PCBs, but hardly for already manufactured boards. ASIC replacement for small volumes. °Flash FPGAs may be re-programmed several thousand times and are non-volatile Expensive, re-configuration takes several seconds °SRAM FPGAs dominating technology unlimited re-programming additional circuitry is required to load the configuration into the FPGA after power on re-configuration is very fast, Some devices allow even partial re-configuration during operation

19 Antifuse (Actel FPGA) °An antifuse is normally an open circuit. °Two-terminal elements connected to upper and lower layer of the antifuse, in the middle is a dielectric (Oxygen- Nitrogen-Oxygen, ONO) layer °Initial state: High resistance of dielectric does not allow any current to flow. °Applying a high voltage: causes large power dissipation and melts the dielectric Drastically reduces the resistance a link can be built, which permanently connects the two layers.

20 Antifuse chips °Advantage ! Small area -With metal-to-metal anti-fuses, no silicon area is required to make connections, decreasing the area overhead of programmability. Much lower resistance and parasitic capacitance over transistors. -possible to include more switches per device -reduces the RC delays in the routing. No bitstream can be intercepted in the field (no bitstream transfer) -Need a Scanning Electron Microscope to try to know antifuse states (an Actel AX2OOO antifuse FPGA contains 53 million antifuses with only 2-5% programmed in an average design) Interconnect structure is naturally “rad hard,” -relatively immune to the effects of radiation (except flip-flops!), -SRAM-based component can be “flipped” if hit by radiation

21 Antifuse chips °Disadvantage ! not suitable for devices that must be frequently reprogrammed one-time programmable FPGAs. special programmers must be used to program a device before it is mounted on a final product involves significant changes to the properties of the materials in the fuse, -leads to scaling challenges when new IC fabrication processes are considered

22 Programmability Options °Static Random Access Memory (SRAM) Programming: Switch is a pass transistor controlled by the state of the SRAM bit Logic block configuration bits are stored in SRAM can be reprogrammed infinite number of times use of standard CMOS process technology -SRAM cells are created using exactly the same CMOS technologies as the rest of the device, -No special processing steps are required in order to create these components. -benefit from the increased integration, higher speeds and lower dynamic power consumption of new processes with smaller minimum geometries.

23 Programmability Options °SRAM Volatility programming contents NOT retained after power down external non-volatile memory device required on power up °SRAM Size SRAM cell requires either 5 or 6 transistors and the programmable element used to interconnect signals requires at least a single transistor. °SRAM Security Since the configuration information must be loaded into the device at power up, there is the possibility that the configuration information could be intercepted and stolen for use in a competing system.

24 Programmability Options °Flash Programming: alternative that addresses some of the shortcomings of SRAM °Use of floating gate programming technologies inject charge onto a gate that “floats” above the transistor. °Non-volatile eliminates the need for the external storage for configuration data can function immediately upon power-up °Area efficiency Area overhead: The programming circuitry (high and low voltage buffers) needed to program the cell, Cost is relatively modest as it is amortized across numerous programmable elements.

25 Programmability Options °Cannot be reprogrammed an infinite number of times. Charge buildup in the oxide eventually prevents a flash-based device from being properly erased and programmed °Non-standard CMOS process. around five additional process steps on top of standard CMOS behind SRAM-based devices by one or more generations. °Programming time is about three times that of an SRAM- based component. °High resistance and capacitance due to the use of transistor-based switches. °Solution: on-chip flash memory to provide non-volatile storage with SRAM cells to control the programmable elements in the design.

26 Programmability Options °An ideal technology non-volatile reprogrammable using a standard CMOS process offer low on resistances and low parasitic capacitances.

27 FPGA Components °How can we implement any circuit in an FPGA? Example: Half adder -Combinational logic represented by truth table -What kind of hardware can implement a truth table? InputOut ABS 000 011 101 110 InputOut ABC 000 010 100 111

28 FPGA Components °Lookup Table (LUT) °Implement truth table in small memories (LUTs) Usually SRAM °A function is implemented by writing all possible values that the function can take in the LUT °The inputs values are used to address the LUT and retrieve the value of the function corresponding to the input values ABS 000 011 101 110 ABC 000 010 100 111 0 1 1 0 Addr Output 0 0 0 1 2-input, 1-output LUTs 00 01 10 11 00 01 10 11 A B Addr A B S C

29 FPGA Components °Alternatively, could have used a 2-input, 2-output LUT Outputs commonly use same inputs 0 1 1 0 S 0 0 0 1 C 0 1 1 0 S 0 0 0 1 C 00 01 10 11 00 01 10 11 00 01 10 11 Addr A B A B A B

30 FPGA Components °Slightly bigger example: Full adder Combinational logic can be implemented in a LUT with same number of inputs and outputs -3-input, 2-ouput LUT InputsOutputs ABCinSCout 00000 00110 01010 01101 10010 10101 11001 11111 00 10 10 01 10 01 01 11 A B Cin SCout Truth Table 3-input, 2-output LUT

31 FPGA Components °LUT Example: Implement the function ABD+BCD+ABC 2-input LUTs 3-input LUTs 4-input LUTs

32 FPGA Components °LUTs are used as function generators °How many SRAM locations does a k-input LUT have? °How many different functions can a k-input LUT implement? 0 1 1 0 S 0 0 0 1 C 01 10 11 Addr A B 00 2k2k 22k22k

33 FPGA Components °Why aren’t FPGAs just a big LUT? °Size of truth table grows exponentially based on # of inputs 3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc. Same number of rows in truth table and LUT LUTs grow exponentially based on # of inputs °Number of SRAM bits in a LUT = 2 i * o i = # of inputs, o = # of outputs Example: 64 input combinational logic with 1 output would require 2 64 SRAM bits -1.84 x 10 19 °Clearly, not feasible to use large LUTs So, how do FPGAs implement logic with many inputs?

34 FPGA Components °Fortunately, we can map circuits onto multiple LUTs Divide circuit into smaller circuits that fit in LUTs (same # of inputs and outputs) Example: 3-input, 2-output LUTs

35 FPGA Components °Large LUTs Fast when using all inputs Wastes transistors otherwise °Must also consider total chip area Wasting transistors may be ok if there are plenty of LUTs

36 FPGA Components °What if circuit doesn’t map perfectly? More inputs in LUT than in circuit -Truth table handles this problem More outputs in LUT than in circuit -Extra outputs simply not used –Space is wasted, so should use multiple outputs whenever possible °Important Point The number of gates in a circuit has no effect on the mapping into a LUT -All that matters is the number of inputs and outputs -Unfortunately, it isn’t common to see large circuits with a few inputs 1 gate 1,000,000 gates

37 FPGA Components °LUT-Realization °A LUT is basically a multiplexer that evaluates the truth table stored in the configuration SRAM cells (can be seen as a one bit wide ROM).

38 °QUIZ2

39 FPGA Components °Example: Determine best LUTs for following circuit -Choices –4-input, 2-output LUT (delay = 2 ns) –6-input, 2-output LUT (delay = 3 ns) -Assume each SRAM cell is 6 transistors –4-input LUT = 6 * 2 4 * 2 = 192 transistors –6-input LUT = 6 * 2 6 * 2 = 384 transistors

40 FPGA Components °Example: Determine best LUTs for following circuit -Choices –4-input, 2-output LUT (delay = 2 ns) –6-input, 2-output LUT (delay = 3 ns) -Assume each SRAM cell is 6 transistors –4-input LUT = 6 * 2 4 * 2 = 192 transistors –6-input LUT = 6 * 2 6 * 2 = 384 transistors 6-input LUT Propagation delay = 3 ns Total transistors = 384

41 FPGA Components °Example: Determine best LUTs for following circuit -Choices –4-input, 2-output LUT (delay = 2 ns) –6-input, 2-output LUT (delay = 3 ns) -Assume each SRAM cell is 6 transistors –4-input LUT = 6 * 2 4 * 2 = 192 transistors –6-input LUT = 6 * 2 6 * 2 = 384 transistors 4-input LUT Propagation delay = 4 ns Total transistors = 384 transistors 6-input LUTs are 1.3x faster and use same area

42 FPGA Components °Problem: How to handle sequential logic Truth tables don’t work °Possible solution: Add a flip-flop to the output of LUT °BLEs: the basic logic element Circuit can now use output from LUT or from FF Where does select come from?


Download ppt "ECE 506 Reconfigurable Computing Lecture 3 Reconfigurable Architectures Ali Akoglu."

Similar presentations


Ads by Google