ESE534: Computer Organization Day 11: October 10, 2016 Instruction Space Modeling 1 Penn ESE534 Fall 2016 -- DeHon
Last Time Instruction Requirements Instruction Space Penn ESE534 Fall 2016 -- DeHon
Architecture Taxonomy PCs Pints/PC depth width Architecture N 1 FPGA N (48,640) 8 Tabula ABAX (A1EC04) 1024 32 Scalar Processor (RISC) D W VLIW (superscalar) Small W*N SIMD, GPU, Vector MIMD 16 1 (4?) 2048 64 16-core Penn ESE534 Fall 2016 -- DeHon
Today Model Architecture from Instruction Parameters implied costs gross application characteristics Focus on width as example Instruction depth on Wed. Penn ESE534 Fall 2016 -- DeHon
Quotes If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long Penn ESE534 Fall 2016 -- DeHon
Modeling Why do we model? Penn ESE534 Fall 2016 -- DeHon
Motivation Need to understand How costly is a solution Big, slow, hot, energy hungry…. How compare to alternatives Cost and benefit of flexibility Penn ESE534 Fall 2016 -- DeHon
What we really want: Complete implementation of our application For each architectural alternatives In same implementation technology w/ multiple area-time points Penn ESE534 Fall 2016 -- DeHon
Reality Seldom get it packaged that nicely We must deal with much work to do so technology keeps moving We must deal with estimation from components technology differences few area-time points Penn ESE534 Fall 2016 -- DeHon
Modeling Instruction Effects Restrictions from “ideal” save area and energy limit usability (yield) of PE May cost more energy, area in the end… Want to understand effects Area, energy model utilization/yield model Penn ESE534 Fall 2016 -- DeHon
Review What motivated us to explore SIMD word width? Why did we believe this simplification might be reasonable? Penn ESE534 Fall 2016 -- DeHon
Application Needs What are common application datawidths? Penn ESE534 Fall 2016 -- DeHon
Preclass Energies? 16-bit on 32-bit? 8-bit, 16-bit, 32-bit, 64-bit 16-bit on 32-bit? Sources of inefficiency? 8-bit operations per 64-bit operation? 64-bit on 8-bit? Penn ESE534 Fall 2016 -- DeHon
Area Target Switch focus to area-efficiency Equivalently computational density Come back to energy-efficiency at end of lecture Penn ESE534 Fall 2016 -- DeHon
Efficiency/Yield Intuition (Area) What happens when Datapath is too wide? Datapath is too narrow? Instruction memory is too deep? Penn ESE534 Fall 2016 -- DeHon
Computing Device Composition Bit Processing elements Interconnect: space Interconnect: time Instruction Memory Tile together to build device Penn ESE534 Fall 2016 -- DeHon
Computing Device Penn ESE534 Fall 2016 -- DeHon
Computing Device Composition Bit Processing elements Interconnect: space Interconnect: time Instruction Memory Tile together to build device Penn ESE534 Fall 2016 -- DeHon
Relative Sizes Bit Operator 3-5KF2 Bit Operator Interconnect 200K-250KF2 Instruction (w/ interconnect) 20KF2 Memory bit (SRAM) 250-500F2 Shown here are the ballpark sizes for each of the major components in these devices. Here, and elsewhere, I’ll be talking about feature sizes in normalized units of lambda, where lambda is half the minimum feature size in a VLSI process. This way I can talk about areas without tying my general results to any particular process. Lambda-squared, then is a normalized measure of area. These numbers are come from looking at the composition in VLSI layouts both constructively and decompositionally looking at existing artifacts. A bit operator, like an ALU bit or an FPGA LUT or even a gate, along with a reasonable slice of interconnect will take up 500K to 1M lambda-squared. The instruction which controls this compute and interconnect will be about one 10th to one 12th that size. A single memory bit for holding data will be fairly small, but these do add up when we collect a large number of them together. Shown here is the relative area for these components. Note that the operator itself is generally negligible in area compared to the interconnect. As we’ve noted the instruction is an order of magnitude smaller than the interconnect, but a large number of these can easily dominate the interconnect. Similarly for memory bits, they’re small by themselves but can rapidly add up to consume significant amounts of area. So, now let’s look at how these areas impact what we can build in fixed area. Penn ESE534 Fall 2016 -- DeHon
Model Area Penn ESE534 Fall 2016 -- DeHon
Architectures Fall in Space Penn ESE534 Fall 2016 -- DeHon
Calibrate Model Arch. w d c k Area (F2) FPGA 1 4 220K Altera 8K 230K Xilinx 4K 160K SIMD 1000 2 43K Abacus 48K Processor 32 1024 650K MIPS-X 530K Penn ESE534 Fall 2016 -- DeHon
Peak Densities from Model Penn ESE534 Fall 2016 -- DeHon
Peak Densities from Model Only 2 of 4 parameters small slice of space 100 density across Large difference in peak densities large design space! Penn ESE534 Fall 2016 -- DeHon
Architecture Points Penn ESE534 Fall 2016 -- DeHon
Architectural parameters Peak Densities Penn ESE534 Fall 2016 -- DeHon
Efficiency What do we really want to maximize? Yield Fraction / Area Not peak, “guaranteed not to exceed” performance, but… Useful work per unit silicon [per Joule] Yield Fraction / Area (or minimize (Area/Yielded performance) ) Penn ESE534 Fall 2016 -- DeHon
Efficiency For comparison, look at relative efficiency to ideal. Ideal = architecture exactly matched to application requirements Efficiency = Aideal/Aarch Aarch = Area Op/Yield Penn ESE534 Fall 2016 -- DeHon
Width Mismatch Efficiency Calculation Penn ESE534 Fall 2016 -- DeHon
Efficiency: Width Mismatch 16K PEs Penn ESE534 Fall 2016 -- DeHon
Efficiency: Width Mismatch 16K PEs Penn ESE534 Fall 2016 -- DeHon
Efficiency: Width Mismatch 16K PEs Penn ESE534 Fall 2016 -- DeHon
Energy Target Switch to energy focus for rest of lecture Penn ESE534 Fall 2016 -- DeHon
Efficiency for Preclass Preclass 6 table Penn ESE534 Fall 2016 -- DeHon
Robust Point Can we find an architecture that Is reasonably efficient regardless of application needs? never less efficient than 50% ? Penn ESE534 Fall 2016 -- DeHon
Robust Point What is Energy Robust Point for preclass model? Penn ESE534 Fall 2016 -- DeHon
Big Ideas [MSB Ideas] Applications typically have structure Exploit this structure to reduce resource requirements Architecture is about understanding and exploiting structure and costs to reduce requirements Penn ESE534 Fall 2016 -- DeHon
Admin Drop date today HW4 graded HW5.1 Due Wed. Reading for Wed. Finish chapter Penn ESE534 Fall 2016 -- DeHon