ESE534: Computer Organization

Slides:

Advertisements

Similar presentations

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.

Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.

CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.

Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.

Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: March 3, 2014 Instruction Space Modeling 1.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.

CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day7: October 16, 2000 Instruction Space (computing landscape)

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.

Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.

ESE532: System-on-a-Chip Architecture

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE532: System-on-a-Chip Architecture

ESE532: System-on-a-Chip Architecture

ESE534: Computer Organization

ESE534: Computer Organization

ESE532: System-on-a-Chip Architecture

ESE532: System-on-a-Chip Architecture

CS184a: Computer Architecture (Structure and Organization)

ESE534 Computer Organization

ESE532: System-on-a-Chip Architecture

ESE532: System-on-a-Chip Architecture

ESE532: System-on-a-Chip Architecture

CS184a: Computer Architecture (Structure and Organization)

Day 1: August 28, 2013 Introduction and Overview

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

CS184a: Computer Architecture (Structure and Organization)

ESE534: Computer Organization

ESE535: Electronic Design Automation

ESE534: Computer Organization

ESE535: Electronic Design Automation

ESE534: Computer Organization

ESE532: System-on-a-Chip Architecture

ESE532: System-on-a-Chip Architecture

ESE532: System-on-a-Chip Architecture

Presentation transcript:

ESE534: Computer Organization Day 12: March 5, 2014 Instruction Space Modeling 2 Penn ESE534 Spring 2014 -- DeHon

Previously Instruction Space Modeling {Area, Energy} from Architecture Application Structure Mismatch Net area, energy Efficiency Example: Wtask vs. Warch Penn ESE534 Spring 2014 -- DeHon

Today Application vs. Architecture Mismatch Net area, energy Efficiency Path Length vs. Context Depth Impact of multiple dimensions Penn ESE534 Spring 2014 -- DeHon

Context Depth Penn ESE534 Spring 2014 -- DeHon

Two Application Drivers Latency goal Throughput goal E.g. operate in real time Penn ESE534 Spring 2014 -- DeHon

Throughput Goals Examples of application throughput goals we might have? Penn ESE534 Spring 2014 -- DeHon

Throughput Goal Have 1GHz datapath (1 ns operator) Want to compute one result every 10ns How many operations can it perform per bit operator in the 10ns period? Throughput-driven Path Length Path Length: How much sequentialization is allowed? Penn ESE534 Spring 2014 -- DeHon

Latency Path Length How many primitive-operator delays before can perform next operation? Reuse the resource Penn ESE534 Spring 2014 -- DeHon

Latency Goal Reuse How many times can I reuse each primitive operator without Increasing latency? Path Length: How much sequentialization is required? Penn ESE534 Spring 2014 -- DeHon

Critical Path Single cycle multiply and add What is critical path? si=ai*qi-1+(1-ai)*xi ti=bi*ri-1+(1-bi)*xi qi=qi-1+si ri=ri-1+ti What is critical path? Penn ESE534 Spring 2014 -- DeHon

Application Needs Common critical paths? Penn ESE534 Spring 2014 -- DeHon

Examples Application Wapp Lcritpath Lpath Notes Conway LIFE 1 Run as fast as possible ECC 1-10 100 100ns memory interface Video 8 1-6 24 1GHz for 1024x1024x30 frames/s Audio 16 20,000 44KHz for 1GHz FDTD 35 1-5 Penn ESE534 Spring 2014 -- DeHon

Model Area Penn ESE534 Spring 2014 -- DeHon

Relative Sizes Bit Operator 4KF2 Bit Operator Interconnect 240KF2 Instruction (w/ interconnect) 20KF2 Memory bit (SRAM) 300F2 Shown here are the ballpark sizes for each of the major components in these devices. Here, and elsewhere, I’ll be talking about feature sizes in normalized units of lambda, where lambda is half the minimum feature size in a VLSI process. This way I can talk about areas without tying my general results to any particular process. Lambda-squared, then is a normalized measure of area. These numbers are come from looking at the composition in VLSI layouts both constructively and decompositionally looking at existing artifacts. A bit operator, like an ALU bit or an FPGA LUT or even a gate, along with a reasonable slice of interconnect will take up 500K to 1M lambda-squared. The instruction which controls this compute and interconnect will be about one 10th to one 12th that size. A single memory bit for holding data will be fairly small, but these do add up when we collect a large number of them together. Shown here is the relative area for these components. Note that the operator itself is generally negligible in area compared to the interconnect. As we’ve noted the instruction is an order of magnitude smaller than the interconnect, but a large number of these can easily dominate the interconnect. Similarly for memory bits, they’re small by themselves but can rapidly add up to consume significant amounts of area. So, now let’s look at how these areas impact what we can build in fixed area. Penn ESE534 Spring 2014 -- DeHon

Context (Instruction) Depth Penn ESE534 Spring 2014 -- DeHon

Efficiency with fixed Width Path Length Context Depth w=1, 16K PEs Penn ESE534 Spring 2014 -- DeHon

Robust Point for C Abitelm = 240K + C*30K Given compute model: HW5.2c What is area efficient robust point for C? HW5.2c “energy competitive” = robust point Penn ESE534 Spring 2014 -- DeHon

Ideal Efficiency (different model) Two resources here: active processing elements operation description/state Applications need in different proportions. Application Requirement …make sure to define lpath Penn ESE534 Spring 2014 -- DeHon

Technology Change Abitelm = 240K + C*30K What happens to the robust oint if we have a technology change that reduces the cost of memory cells? E.g. from 300F2 SRAM cells to 100F2 DRAM cells (logic process)? Penn ESE534 Spring 2014 -- DeHon

Multiple Parameters Penn ESE534 Spring 2014 -- DeHon

C Robust Point depends on Width Penn ESE534 Spring 2014 -- DeHon

Model Area Penn ESE534 Spring 2014 -- DeHon

Robust Point for C given W Abitelm = 240K + (C/W)*30K Given compute model: What is area efficient robust point for C: When W=8? When W=64? Penn ESE534 Spring 2014 -- DeHon

Robust Point depend on Width Penn ESE534 Spring 2014 -- DeHon

Intermediate Architecture w=8 c=64 16K PEs Hard to be robust across entire space… Penn ESE534 Spring 2014 -- DeHon

Processors and FPGAs (architecture vs. two application axes) c=d=1024, w=64, k=2 FPGA c=d=1, w=1, k=4 Penn ESE534 Spring 2014 -- DeHon

Caveats Model abstracts away many details that are important interconnect (day 16—20, 25) control (day 23) specialized functional units (day 15) Applications are a heterogeneous mix of characteristics Penn ESE534 Spring 2014 -- DeHon

Modeling Message Architecture space is huge Easy to be very inefficient Valuable to understand to pick right point Hard to pick one point robust across entire space Why we have so many architectures? Penn ESE534 Spring 2014 -- DeHon

General Message Parameterize architectures Look at continuum costs benefits Often have competing effects leads to maxima/minima Penn ESE534 Spring 2014 -- DeHon

Big Ideas [MSB Ideas] Instruction organization induces a design space (taxonomy) for programmable architectures Arch. structure and application requirements mismatch  inefficiencies Model  visualize efficiency trends Architecture space is huge can be very inefficient need to learn to navigate Penn ESE534 Spring 2014 -- DeHon

Admin Homework HW5.1 Due Today Reading Spring Break Marked 4.1 TODO 4.2, 4.3 HW5.1 Due Today Reading Spring Break Penn ESE534 Spring 2014 -- DeHon