Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.

Similar presentations


Presentation on theme: "Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed."— Presentation transcript:

1 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed and Configurable

2 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Overview Three types of FPGAs -EEPROM -SRAM -Antifuse SRAM FPGA architectural choices. FPGA logic blocks -> size versus performance. FPGA switch boxes State-of-the-art -Research issues in architecture.

3 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 What is Computation? Calculating predictable data outputs from data inputs. What should we expect from a computing device? Gives correct answer. Takes up finite space Computes in finite time Can solve all problems? - Compilation - Implementation Other issues

4 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Compilation How long does it take to “map” an idea to hardware? Why is the processor so “easy” to target for compilation? Processor FPGA Gate Array Full Custom Compilation time Performance low high lowhigh

5 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 What are variables in Computation? Time -> How long does it take to compute the answer? Area -> How much silicon space is required to determined the answer?  Processor generally fixes computing area. Problem evaluated over time through instructions.  FPGA can create flexible amount of computing area. Effectively, the configuration memory is the computing instruction.

6 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Measuring Feature Size Current FPGAs follow the same technology curve as microprocessors. Difficult to compare device sizes across generations so we use a fixed metric, lambda ( ). Lambda defines basic feature sizes in the VLSI device. λ 8λ 5λ spacing metal 3 8λ 3λ overlap metal 2+3

7 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Toward Computational Comparison Dehon metrics: Computational density of a device λ 2 x s 4 input gate-evaluations Processor: 2 x N ALU x W ALU A proc x t cycle FPGA: N 4lut A array x t cycle

8 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Degradation FPGA can’t really be clocked at 1/7 ns due to interconnect. Consider the Bubblesort block from the first class. If (A > B) { H = A; L = B; } else { H = B; L = A; } Ci00001111Ci00001111 A00110011A00110011 B01010101B01010101 S01101001S01101001 Co00010111Co00010111 AB AB compare H requires 33 LUT delays

9 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 New Comparison Processor required three cycles at 500 MHz FPGA requires 33 LUTs delays per computation. Could consider other parts of design. Designorganization λ 2 cyclege/λ 2 x s 1994 MIPs 1x321.7G 2 ns 19 1992 Xilinx49 CLB (2 x4LUT)61M 7 ns 230

10 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Parallelization How this performance factor change over time? – through parallelization. For a given operation ge/(λ 2.s) seems the same -> 7 However, multiple comparisons could be performed in parallel. Now FPGA metric is 28 Of course, device may be only partially filled. A0A1 HL A2A3 HL A4A5 HL A6A7 HL

11 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Specialization Example: encryption constantvariable

12 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Instructions Many applications have little parallelism or have variable hardware requirements during execution. Here using more area doesn’t increase computational density. Better to reuse hardware through instructions A B operation +, -, |, x

13 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Single-Instruction Multiple Data Same instruction distributed to fine-grained cells. Typically organized as 2-D array Ideal for image processing Typically fixed hardware located in cell op multi-bit

14 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Computation Unit for SIMD Performs different operation on every cycle Easy to distribute instructions on device (use global lines) Some local storage for data in each tile From local state or other array elements To local state or other array elements Global Instruction common to all elements............

15 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Computation Unit for FPGA Performs same operation on every cycle No global distribution of instructions at all (stored locally) Also has local storage for data. From local state or other array elements To local state or other array elements Static instruction distinct for each array element............

16 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Hybrid Architecture Configuration selects operation of computation unit Context identifier changes over time to allow change in functionality DPGA – Dynamically Programmable Gate Array............ in Computation Unit (LUT) out Address Inputs (Inst. Store) Context Identifier Programming may differ for each element

17 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 DPGA How many applications require this flexibility Efficient techniques needed to schedule when functionality shifts. + A3A3 B3B3 O3O3 + A2A2 B2B2 O2O2 + A1A1 B1B1 O1O1 + A0A0 B0B0 O0O0 context identifier Added configuration allows for functionality to change quickly Doubles SRAM storage requirement

18 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Multicontext Organization/Area °A ctxt  80K 2 dense encoding °A base  800K 2 °Slides: courtesy DeHon °A ctxt :A base = 1:10

19 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Example: DPGA Prototype

20 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 FPGA vs. DPGA Compare

21 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Example: DPGA Area

22 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Configuration Caching What if I swap out some unused configurations while they are not used? Separate hardware to write given locations in hardware (config mem) and not interrupt circuit operation Just like cache prefetching LUT Config Store Context ID Out

23 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Hierarchical FPGA Predictable Delay Two dimensional layout Limited connectivity

24 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Buffering Pipelining interconnect comes at an area cost Also could consider buffering s DQ s DQ s Unpipelined Pipelined 18 transistors

25 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 What about this circuit? Retiming needed for hierarchical device. Number of registers proportional to longest path. Complicates design Software, debugging Need to schedule communication LUT

26 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 PLD (Programmable Logic Device) °All layers already exist Designers can purchase an IC Connections on the IC are either created or destroyed to implement desired functionality Field-Programmable Gate Array (FPGA) very popular °Benefits Low NRE costs, almost instant IC availability °Drawbacks Penalty on area, cost (perhaps $30 per unit), performance, and power °Acknowledgement: Mishra

27 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Design Technology °The manner in which we convert our concept of desired system functionality into an implementation Libraries/IP: Incorporates pre-designed implementation from lower abstraction level into higher level. System specification Behavioral specification RT specification Logic specification To final implementation Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level. Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels. Compilation/ Synthesis Libraries/ IP Test/ Verification System synthesis Behavior synthesis RT synthesis Logic synthesis Hw/Sw/ OS Cores RT components Gates/ Cells Model simulat./ checkers Hw-Sw cosimulators HDL simulators Gate simulators

28 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Design productivity gap °1981 leading edge chip required 100 man-months 10,000 transistors / 100 transistors/month °2002 leading edge chip requires 30K man-months 150,000,000 / 5000 transistors/month °Designer cost increase from $1M to $300M 10,000 1,000 100 10 1 0.1 0.01 0.001 Logic transistors per chip (in millions) 100,000 10,000 1000 100 10 1 0.1 0.01 Productivity (K) Trans./Staff-Mo. 198119831985198719891991199319951997199920012003200520072009 IC capacity productivity Gap

29 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 The mythical man-month °In theory, adding designers to team reduces project completion time °In reality, productivity per designer decreases due to complexities of team management and communication overhead °In the software community, known as “the mythical man-month” (Brooks 1975) °At some point, can actually lengthen project completion time! 102030400 10000 20000 30000 40000 50000 60000 43 24 19 16 15 16 18 23 Team Individual Months until completion Number of designers °1M transistors, one designer=5000 trans/month °Each additional designer reduces for 100 trans/month °So 2 designers produce 4900 trans/month each

30 Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Summary Interesting similarities between processor and reconfigurable device Processors are reconfigured on every clock cycle using an instruction FPGAs configured once at beginning of computation DPGAs blur the line – run-time reconfiguration Numerous challenges to reconfiguration -When -How -Performance benefit?


Download ppt "Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed."

Similar presentations


Ads by Google