Caltech CS184 Winter2003 -- DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.

Slides:



Advertisements
Similar presentations
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.
Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day20: November 29, 2000 Review.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: March 3, 2014 Instruction Space Modeling 1.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day3:
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day12:
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day18: November 22, 2000 Control.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day7: October 16, 2000 Instruction Space (computing landscape)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 20: March 3, 2003 Control.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 11: May10, 2001 Data Parallel (SIMD, SPMD, Vector)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 7, 2014 Memory Overview.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 5: January 17, 2003 What is required for Computation?
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 8, 2013 Memory Overview.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Presentation transcript:

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space

Caltech CS184 Winter DeHon 2 Previously Computing Requirements –Compute –Interconnect space time –Instructions VLSI Scaling

Caltech CS184 Winter DeHon 3 Today Instructions –Requirements –Taxonomy –Model Architecture implied costs gross application characteristics

Caltech CS184 Winter DeHon 4 Instruction Taxonomy

Caltech CS184 Winter DeHon 5 Instructions Distinguishing feature of programmable architectures? –Instructions -- bits which tell the device how to behave Compute net0 add mem slot#6

Caltech CS184 Winter DeHon 6 Focus on Instructions Instruction organization has a large effect on: –size or compactness of an architecture –realm of efficient utilization for an architecture

Caltech CS184 Winter DeHon 7 Terminology Primitive Instruction (pinst) –Collection of bits which tell a single bit- processing element what to do –Includes: select compute operation input sources in space –(interconnect) input sources in time –(retiming) Compute net0 add mem slot#6

Caltech CS184 Winter DeHon 8 Computational Array Model Collection of computing elements –compute operator –local storage/retiming Interconnect Instruction

Caltech CS184 Winter DeHon 9 “Ideal” Instruction Control Issue a new instruction to every computational bit operator on every cycle

Caltech CS184 Winter DeHon 10 “Ideal” Instruction Distribution Why don’t we do this?

Caltech CS184 Winter DeHon 11 “Ideal” Instruction Distribution Problem: Instruction bandwidth (and storage area) quickly dominates everything else –Compute Block ~ 1M   x  –Instruction ~ 64 bits –Wire Pitch ~  –Memory bit ~  

Caltech CS184 Winter DeHon 12 Instruction Distribution 64x8 =512 Two instructions in 1024

Caltech CS184 Winter DeHon 13 Instruction Distribution Distribute from both sides = 2x

Caltech CS184 Winter DeHon 14 Instruction Distribution Distribute X and Y = 2x

Caltech CS184 Winter DeHon 15 Instruction Distribution Room to distribute 2 instructions across PE per metal layer (1024 = 2  8  64) Feed top and bottom (left and right) = 2  Two complete metal layers = 2   8 instructions / PE Side

Caltech CS184 Winter DeHon 16 Instruction Distribution Maximum of 8 instructions per PE side Saturate wire channels at 8  N = N  at 64 PE –beyond this: instruction distribution dominates area Instruction consumption goes with area Instruction bandwidth goes with perimeter

Caltech CS184 Winter DeHon 17 Instruction Distribution Beyond 64 PE, instruction bandwidth dictates PE size PE area =16K   N = N (64  8 )  PE area  4  N As we build larger arrays  processing elements become less dense

Caltech CS184 Winter DeHon 18 Instruction Memory Requirements Idea: put instruction memory in array Problem: Instruction memory can quickly dominate area, too –Memory Area = 64  1.2K  /instruction –PE area = 1M  + (Instructions)  80K 

Caltech CS184 Winter DeHon 19 Instruction Pragmatics Instruction requirements could dominate array size. Standard architecture trick: –Look for structure to exploit in “typical computations”

Caltech CS184 Winter DeHon 20 Typical Structure? What structure do we usually expect?

Caltech CS184 Winter DeHon 21 Two Extremes SIMD Array (microprocessors) –Instruction/cycle –share instruction across array of PEs –uniform operation in space –operation variance in time

Caltech CS184 Winter DeHon 22 Two Extremes SIMD Array (microprocessors) –Instruction/cycle –share instruction across array of PEs –uniform operation in space –operation variance in time FPGA –Instruction/PE –assume temporal locality of instructions (same) –operation variance in space –uniform operations in time

Caltech CS184 Winter DeHon 23 Hybrids VLIW (SuperScalar) –Few pinsts/cycle –Share instruction across w bits DPGA –Small instruction store / PE

Caltech CS184 Winter DeHon 24 Architecture Instruction Taxonomy

Caltech CS184 Winter DeHon 25 Instruction Message Architectures fall out of: –general model too expensive –structure exists in common problems –exploit structure to reduce resource requirements Architectures can be viewed in a unified design space

Caltech CS184 Winter DeHon 26 Quotes If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long

Caltech CS184 Winter DeHon 27 Modeling Why do we model?

Caltech CS184 Winter DeHon 28 Motivation Need to understand –How costly (big) is a solution –How compare to alternatives –Cost and benefit of flexbility

Caltech CS184 Winter DeHon 29 What we really want: Complete implementation of our application For each architectural alternatives –In same implementation technology –w/ multiple area-time points

Caltech CS184 Winter DeHon 30 Reality Seldom get it packaged that nicely –much work to do so –technology keeps moving Deal with –estimation from components –technology differences –few area-time points

Caltech CS184 Winter DeHon 31 Modeling Instruction Effects Restrictions from “ideal” save area Restriction from “ideal” limits usability (yield) of PE Want to understand effects –area model –utilization/yield model

Caltech CS184 Winter DeHon 32 Efficiency/Yield Intuition What happens when –Datapath is too wide? –Datapath is too narrow? –Instruction memory is too deep? –Instruction memory is too shallow?

Caltech CS184 Winter DeHon 33 Computing Device Composition –Bit Processing elements –Interconnect: space –Interconnect: time –Instruction Memory Tile together to build device

Caltech CS184 Winter DeHon 34 Relative Sizes Bit Operator 10-20K 2 Bit Operator Interconnect 500K-1M 2 Instruction (w/ interconnect) 80K 2 Memory bit (SRAM) 1-2K 2

Caltech CS184 Winter DeHon 35 Model Area

Caltech CS184 Winter DeHon 36 Calibrate Model

Caltech CS184 Winter DeHon 37 Peak Densities from Model Only 2 of 4 parameters –small slice of space –100  density across Large difference in peak densities –large design space!

Caltech CS184 Winter DeHon 38 Efficiency What do we want to maximize? –Useful work per unit silicon –(not potential/peak work) Yield Fraction / Area (or minimize (Area/Yield) )

Caltech CS184 Winter DeHon 39 Efficiency For comparison, look at relative efficiency to ideal. Ideal = architecture exactly matched to application requirements Efficiency = A ideal /A arch A arch = Area Op/Yield

Caltech CS184 Winter DeHon 40 Efficiency Calculation

Caltech CS184 Winter DeHon 41 Efficiency: Width Mismatch c=1, 16K PEs

Caltech CS184 Winter DeHon 42 Path Length How many primitive-operator delays before can perform next operation? –Reuse the resource

Caltech CS184 Winter DeHon 43 Reuse Pipeline and reuse at primitive-operator delay level. Path Length: How much sequentialization Is allowed (required)? How many times can I reuse each primitive operator?

Caltech CS184 Winter DeHon 44 Context Depth

Caltech CS184 Winter DeHon 45 Efficiency with fixed Width w=1, 16K PEs Path Length Context Depth

Caltech CS184 Winter DeHon 46 Ideal Efficiency (different model) Two resources here: active processing elements operation description/state Applications need in different proportions. Application Requirement

Caltech CS184 Winter DeHon 47 Robust Point depend on Width w=1 w=8 w=64

Caltech CS184 Winter DeHon 48 Processors and FPGAs FPGA c=d=1, w=1, k=4 “Processor” c=d=1024, w=64, k=2

Caltech CS184 Winter DeHon 49 Intermediate Architecture w=8 c=64 16K PEs Hard to be robust across entire space…

Caltech CS184 Winter DeHon 50 Caveats Model abstracts away many details which are important –interconnect (day ) –control (day 20) –specialized functional units (next time) Applications are a heterogeneous mix of characteristics

Caltech CS184 Winter DeHon 51 Modeling Message Architecture space is huge Easy to be very inefficient Hard to pick one point robust across entire space Why we have so many architectures?

Caltech CS184 Winter DeHon 52 General Message Parameterize architectures Look at continuum –costs –benefits Often have competing effects –leads to maxima/minima

Caltech CS184 Winter DeHon 53 Big Ideas [MSB Ideas] Instruction resources can be significant –dominant/limiting resource Applications typically have structure Exploit this structure to reduce resource requirements Architecture is about understanding and exploiting structure and costs to reduce requirements

Caltech CS184 Winter DeHon 54 Big Ideas [MSB Ideas] Instruction organization induces a design space (taxonomy) for programmable architectures Arch. structure and application requirements mismatch  inefficiencies Model  visualize efficiency trends Architecture space is huge –can be very inefficient –need to learn to navigate