Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESE534: Computer Organization

Similar presentations


Presentation on theme: "ESE534: Computer Organization"— Presentation transcript:

1 ESE534: Computer Organization
Day 11: October 10, 2016 Instruction Space Modeling 1 Penn ESE534 Fall DeHon

2 Last Time Instruction Requirements Instruction Space
Penn ESE534 Fall DeHon

3 Architecture Taxonomy
PCs Pints/PC depth width Architecture N 1 FPGA N (48,640) 8 Tabula ABAX (A1EC04) 1024 32 Scalar Processor (RISC) D W VLIW (superscalar) Small W*N SIMD, GPU, Vector MIMD 16 1 (4?) 2048 64 16-core Penn ESE534 Fall DeHon

4 Today Model Architecture from Instruction Parameters implied costs
gross application characteristics Focus on width as example Instruction depth on Wed. Penn ESE534 Fall DeHon

5 Quotes If it can’t be expressed in figures, it is not science; it is opinion Lazarus Long Penn ESE534 Fall DeHon

6 Modeling Why do we model? Penn ESE534 Fall DeHon

7 Motivation Need to understand How costly is a solution
Big, slow, hot, energy hungry…. How compare to alternatives Cost and benefit of flexibility Penn ESE534 Fall DeHon

8 What we really want: Complete implementation of our application
For each architectural alternatives In same implementation technology w/ multiple area-time points Penn ESE534 Fall DeHon

9 Reality Seldom get it packaged that nicely We must deal with
much work to do so technology keeps moving We must deal with estimation from components technology differences few area-time points Penn ESE534 Fall DeHon

10 Modeling Instruction Effects
Restrictions from “ideal” save area and energy limit usability (yield) of PE May cost more energy, area in the end… Want to understand effects Area, energy model utilization/yield model Penn ESE534 Fall DeHon

11 Review What motivated us to explore SIMD word width?
Why did we believe this simplification might be reasonable? Penn ESE534 Fall DeHon

12 Application Needs What are common application datawidths?
Penn ESE534 Fall DeHon

13 Preclass Energies? 16-bit on 32-bit?
8-bit, 16-bit, 32-bit, 64-bit 16-bit on 32-bit? Sources of inefficiency? 8-bit operations per 64-bit operation? 64-bit on 8-bit? Penn ESE534 Fall DeHon

14 Area Target Switch focus to area-efficiency
Equivalently computational density Come back to energy-efficiency at end of lecture Penn ESE534 Fall DeHon

15 Efficiency/Yield Intuition (Area)
What happens when Datapath is too wide? Datapath is too narrow? Instruction memory is too deep? Penn ESE534 Fall DeHon

16 Computing Device Composition Bit Processing elements
Interconnect: space Interconnect: time Instruction Memory Tile together to build device Penn ESE534 Fall DeHon

17 Computing Device Penn ESE534 Fall DeHon

18 Computing Device Composition Bit Processing elements
Interconnect: space Interconnect: time Instruction Memory Tile together to build device Penn ESE534 Fall DeHon

19 Relative Sizes Bit Operator 3-5KF2
Bit Operator Interconnect K-250KF2 Instruction (w/ interconnect) KF2 Memory bit (SRAM) F2 Shown here are the ballpark sizes for each of the major components in these devices. Here, and elsewhere, I’ll be talking about feature sizes in normalized units of lambda, where lambda is half the minimum feature size in a VLSI process. This way I can talk about areas without tying my general results to any particular process. Lambda-squared, then is a normalized measure of area. These numbers are come from looking at the composition in VLSI layouts both constructively and decompositionally looking at existing artifacts. A bit operator, like an ALU bit or an FPGA LUT or even a gate, along with a reasonable slice of interconnect will take up 500K to 1M lambda-squared. The instruction which controls this compute and interconnect will be about one 10th to one 12th that size. A single memory bit for holding data will be fairly small, but these do add up when we collect a large number of them together. Shown here is the relative area for these components. Note that the operator itself is generally negligible in area compared to the interconnect. As we’ve noted the instruction is an order of magnitude smaller than the interconnect, but a large number of these can easily dominate the interconnect. Similarly for memory bits, they’re small by themselves but can rapidly add up to consume significant amounts of area. So, now let’s look at how these areas impact what we can build in fixed area. Penn ESE534 Fall DeHon

20 Model Area Penn ESE534 Fall DeHon

21 Architectures Fall in Space
Penn ESE534 Fall DeHon

22 Calibrate Model Arch. w d c k Area (F2) FPGA 1 4 220K Altera 8K 230K
Xilinx 4K 160K SIMD 1000 2 43K Abacus 48K Processor 32 1024 650K MIPS-X 530K Penn ESE534 Fall DeHon

23 Peak Densities from Model
Penn ESE534 Fall DeHon

24 Peak Densities from Model
Only 2 of 4 parameters small slice of space 100 density across Large difference in peak densities large design space! Penn ESE534 Fall DeHon

25 Architecture Points Penn ESE534 Fall DeHon

26 Architectural parameters  Peak Densities
Penn ESE534 Fall DeHon

27 Efficiency What do we really want to maximize? Yield Fraction / Area
Not peak, “guaranteed not to exceed” performance, but… Useful work per unit silicon [per Joule] Yield Fraction / Area (or minimize (Area/Yielded performance) ) Penn ESE534 Fall DeHon

28 Efficiency For comparison, look at relative efficiency to ideal.
Ideal = architecture exactly matched to application requirements Efficiency = Aideal/Aarch Aarch = Area Op/Yield Penn ESE534 Fall DeHon

29 Width Mismatch Efficiency Calculation
Penn ESE534 Fall DeHon

30 Efficiency: Width Mismatch
16K PEs Penn ESE534 Fall DeHon

31 Efficiency: Width Mismatch
16K PEs Penn ESE534 Fall DeHon

32 Efficiency: Width Mismatch
16K PEs Penn ESE534 Fall DeHon

33 Energy Target Switch to energy focus for rest of lecture
Penn ESE534 Fall DeHon

34 Efficiency for Preclass
Preclass 6 table Penn ESE534 Fall DeHon

35 Robust Point Can we find an architecture that
Is reasonably efficient regardless of application needs? never less efficient than 50% ? Penn ESE534 Fall DeHon

36 Robust Point What is Energy Robust Point for preclass model?
Penn ESE534 Fall DeHon

37 Big Ideas [MSB Ideas] Applications typically have structure
Exploit this structure to reduce resource requirements Architecture is about understanding and exploiting structure and costs to reduce requirements Penn ESE534 Fall DeHon

38 Admin Drop date today HW4 graded HW5.1 Due Wed. Reading for Wed.
Finish chapter Penn ESE534 Fall DeHon


Download ppt "ESE534: Computer Organization"

Similar presentations


Ads by Google