ESE534: Computer Organization

Slides:



Advertisements
Similar presentations
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day20: November 29, 2000 Review.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: March 3, 2014 Instruction Space Modeling 1.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day7: October 16, 2000 Instruction Space (computing landscape)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
1 Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture Henry Wong Vaughn Betz, Jonathan Rose.
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
Variable Word Width Computation for Low Power
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
Day 26: November 11, 2011 Memory Overview
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Day 26: November 1, 2013 Synchronous Circuits
ESE534: Computer Organization
ESE534: Computer Organization
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
ESE534: Computer Organization
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
Day 15: October 13, 2010 Performance: Gates
CS184a: Computer Architecture (Structures and Organization)
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Day 16: October 12, 2012 Performance: Gates
Day 16: October 17, 2011 Performance: Gates
ESE532: System-on-a-Chip Architecture
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

ESE534: Computer Organization Day 11: October 10, 2016 Instruction Space Modeling 1 Penn ESE534 Fall 2016 -- DeHon

Last Time Instruction Requirements Instruction Space Penn ESE534 Fall 2016 -- DeHon

Architecture Taxonomy PCs Pints/PC depth width Architecture N 1 FPGA N (48,640) 8 Tabula ABAX (A1EC04) 1024 32 Scalar Processor (RISC) D W VLIW (superscalar) Small W*N SIMD, GPU, Vector MIMD 16 1 (4?) 2048 64 16-core Penn ESE534 Fall 2016 -- DeHon

Today Model Architecture from Instruction Parameters implied costs gross application characteristics Focus on width as example Instruction depth on Wed. Penn ESE534 Fall 2016 -- DeHon

Quotes If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long Penn ESE534 Fall 2016 -- DeHon

Modeling Why do we model? Penn ESE534 Fall 2016 -- DeHon

Motivation Need to understand How costly is a solution Big, slow, hot, energy hungry…. How compare to alternatives Cost and benefit of flexibility Penn ESE534 Fall 2016 -- DeHon

What we really want: Complete implementation of our application For each architectural alternatives In same implementation technology w/ multiple area-time points Penn ESE534 Fall 2016 -- DeHon

Reality Seldom get it packaged that nicely We must deal with much work to do so technology keeps moving We must deal with estimation from components technology differences few area-time points Penn ESE534 Fall 2016 -- DeHon

Modeling Instruction Effects Restrictions from “ideal” save area and energy limit usability (yield) of PE May cost more energy, area in the end… Want to understand effects Area, energy model utilization/yield model Penn ESE534 Fall 2016 -- DeHon

Review What motivated us to explore SIMD word width? Why did we believe this simplification might be reasonable? Penn ESE534 Fall 2016 -- DeHon

Application Needs What are common application datawidths? Penn ESE534 Fall 2016 -- DeHon

Preclass Energies? 16-bit on 32-bit? 8-bit, 16-bit, 32-bit, 64-bit 16-bit on 32-bit? Sources of inefficiency? 8-bit operations per 64-bit operation? 64-bit on 8-bit? Penn ESE534 Fall 2016 -- DeHon

Area Target Switch focus to area-efficiency Equivalently computational density Come back to energy-efficiency at end of lecture Penn ESE534 Fall 2016 -- DeHon

Efficiency/Yield Intuition (Area) What happens when Datapath is too wide? Datapath is too narrow? Instruction memory is too deep? Penn ESE534 Fall 2016 -- DeHon

Computing Device Composition Bit Processing elements Interconnect: space Interconnect: time Instruction Memory Tile together to build device Penn ESE534 Fall 2016 -- DeHon

Computing Device Penn ESE534 Fall 2016 -- DeHon

Computing Device Composition Bit Processing elements Interconnect: space Interconnect: time Instruction Memory Tile together to build device Penn ESE534 Fall 2016 -- DeHon

Relative Sizes Bit Operator 3-5KF2 Bit Operator Interconnect 200K-250KF2 Instruction (w/ interconnect) 20KF2 Memory bit (SRAM) 250-500F2 Shown here are the ballpark sizes for each of the major components in these devices. Here, and elsewhere, I’ll be talking about feature sizes in normalized units of lambda, where lambda is half the minimum feature size in a VLSI process. This way I can talk about areas without tying my general results to any particular process. Lambda-squared, then is a normalized measure of area. These numbers are come from looking at the composition in VLSI layouts both constructively and decompositionally looking at existing artifacts. A bit operator, like an ALU bit or an FPGA LUT or even a gate, along with a reasonable slice of interconnect will take up 500K to 1M lambda-squared. The instruction which controls this compute and interconnect will be about one 10th to one 12th that size. A single memory bit for holding data will be fairly small, but these do add up when we collect a large number of them together. Shown here is the relative area for these components. Note that the operator itself is generally negligible in area compared to the interconnect. As we’ve noted the instruction is an order of magnitude smaller than the interconnect, but a large number of these can easily dominate the interconnect. Similarly for memory bits, they’re small by themselves but can rapidly add up to consume significant amounts of area. So, now let’s look at how these areas impact what we can build in fixed area. Penn ESE534 Fall 2016 -- DeHon

Model Area Penn ESE534 Fall 2016 -- DeHon

Architectures Fall in Space Penn ESE534 Fall 2016 -- DeHon

Calibrate Model Arch. w d c k Area (F2) FPGA 1 4 220K Altera 8K 230K Xilinx 4K 160K SIMD 1000 2 43K Abacus 48K Processor 32 1024 650K MIPS-X 530K Penn ESE534 Fall 2016 -- DeHon

Peak Densities from Model Penn ESE534 Fall 2016 -- DeHon

Peak Densities from Model Only 2 of 4 parameters small slice of space 100 density across Large difference in peak densities large design space! Penn ESE534 Fall 2016 -- DeHon

Architecture Points Penn ESE534 Fall 2016 -- DeHon

Architectural parameters  Peak Densities Penn ESE534 Fall 2016 -- DeHon

Efficiency What do we really want to maximize? Yield Fraction / Area Not peak, “guaranteed not to exceed” performance, but… Useful work per unit silicon [per Joule] Yield Fraction / Area (or minimize (Area/Yielded performance) ) Penn ESE534 Fall 2016 -- DeHon

Efficiency For comparison, look at relative efficiency to ideal. Ideal = architecture exactly matched to application requirements Efficiency = Aideal/Aarch Aarch = Area Op/Yield Penn ESE534 Fall 2016 -- DeHon

Width Mismatch Efficiency Calculation Penn ESE534 Fall 2016 -- DeHon

Efficiency: Width Mismatch 16K PEs Penn ESE534 Fall 2016 -- DeHon

Efficiency: Width Mismatch 16K PEs Penn ESE534 Fall 2016 -- DeHon

Efficiency: Width Mismatch 16K PEs Penn ESE534 Fall 2016 -- DeHon

Energy Target Switch to energy focus for rest of lecture Penn ESE534 Fall 2016 -- DeHon

Efficiency for Preclass Preclass 6 table Penn ESE534 Fall 2016 -- DeHon

Robust Point Can we find an architecture that Is reasonably efficient regardless of application needs? never less efficient than 50% ? Penn ESE534 Fall 2016 -- DeHon

Robust Point What is Energy Robust Point for preclass model? Penn ESE534 Fall 2016 -- DeHon

Big Ideas [MSB Ideas] Applications typically have structure Exploit this structure to reduce resource requirements Architecture is about understanding and exploiting structure and costs to reduce requirements Penn ESE534 Fall 2016 -- DeHon

Admin Drop date today HW4 graded HW5.1 Due Wed. Reading for Wed. Finish chapter Penn ESE534 Fall 2016 -- DeHon