Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Dynamic Topology Optimization for Supercomputer Interconnection Networks Layer-1 (L1) switch –Dumb switch, Electronic “patch panel” –Establishes hard links.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Router Architecture : Building high-performance routers Ian Pratt
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 24: April 21, 2010 Interconnect 6: Dynamically Switched Interconnect.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
CMPE 80N - Introduction to Networks and the Internet 1 CMPE 80N Winter 2004 Lecture 13 Introduction to Networks and the Internet.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 22: April 4, 2007 Interconnect 7: Time Multiplexed Interconnect.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
Issues in System-Level Direct Networks Jason D. Bakos.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Switching, routing, and flow control in interconnection networks.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Computer Networks with Internet Technology William Stallings
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.
Introduction Computer networks: – definition – computer networks from the perspectives of users and designers – Evaluation criteria – Some concepts: –
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
Switching, routing, and flow control in interconnection networks
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
Presentation transcript:

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect

Penn ESE534 Spring DeHon 2 Previously Configured Interconnect –Lock down route between source and sink Multicontext Interconnect –Switched cycle-by-cycle from Instr. Mem. Interconnect Topology Data-dependent control for computation

Penn ESE534 Spring DeHon 3 Today Data-dependent control/sharing of interconnect Dynamic Sharing (Packet Switching) –Motivation –Formulation –Design –Assessment

Penn ESE534 Spring DeHon 4 Motivation

Consider: Preclass 2 How might this be inefficient with configured interconnect? Penn ESE534 Spring DeHon 5

Consider: Preclass 2 How might this be more efficient? What might be inefficient about it? Penn ESE534 Spring DeHon 6

Consider: Preclass 2 How might this be more efficient? What undesirable behavior might this have? Penn ESE534 Spring DeHon 7

8 Opportunity Interconnect major area, energy, delay Instantaneous communication << potential communication Question: can we reduce interconnect requirements by only routing instantaneous communication needs?

Examples Data dependent, unpredictable results –Search filter, compression Slowly changing values –Surveillance, converging values Penn ESE534 Spring DeHon 9

10 Formulation

Penn ESE534 Spring DeHon 11 Dynamic Sharing Don’t reserve resources –Hold a resource for a single source/sink pair –Allocate cycles for a particular route Request as needed Share amongst potential users

Penn ESE534 Spring DeHon 12 Bus Example Time Multiplexed version –Allocate time slot on bus for each communication Dynamic version –Arbitrate for bus on each cycle

Penn ESE534 Spring DeHon 13 Bus Example Time Multiplexed version –Allocate time slot on bus for each communication Dynamic version –Arbitrate for bus on each cycle

Penn ESE534 Spring DeHon 14 Dynamic Bus Example 4 PEs –Potentially each send out result on change –Value only changes with probability 0.1 on each “cycle” –TM: Slot for each PE0 PE1 PE2 PE3 –Dynamic: arbitrate based on need None PE0 none PE1 PE1 none PE3 …. TM either runs slower (4 cycles/compute) or needs 4 busses Dynamic single bus seldom bottleneck –Probability two need bus on a cycle?

Penn ESE534 Spring DeHon 15 Network Example Time Multiplexed –As assumed so far in class –Memory says how to set switches on each cycle Dynamic –Attach address or route designation –Switches forward data toward destination

Penn ESE534 Spring DeHon 16 Penn ESE534 Spring DeHon 16 Butterfly Log stages Resolve one bit per stage bit3bit2bit1bit0

Penn ESE534 Spring DeHon 17 Penn ESE534 Spring DeHon 17 Tree Route Downpath resolves one bit per stage

Penn ESE534 Spring DeHon 18 ESE Spring DeHon 18 Mesh Route Destination (dx,dy) Current location (cx,cy) Route up/down left/right based on (dx-cx,dy-cy)

Prospect Use less interconnect get higher throughput across fixed interconnect By –Dynamically sharing limited interconnect Bottleneck on instantaneous data needs –Not worst-case data needs Penn ESE534 Spring DeHon 19

Penn ESE534 Spring DeHon 20 Design

Penn ESE534 Spring DeHon 21 Ring Ring: 1D Mesh

Penn ESE534 Spring DeHon 22 How like a bus?

Penn ESE534 Spring DeHon 23 Advantage over bus?

Penn ESE534 Spring DeHon 24 Preclass 1 Cycles from simulation? Max delay of single link? Best case possible?

Penn ESE534 Spring DeHon 25 Issue 1: Local online vs Global Offline Dynamic must make local decision –Often lower quality than offline, global decision Quality of decision will reduce potential benefits of reducing instantaneous requiements

Penn ESE534 Spring DeHon 26 Experiment Send-on-Change for spreading activation task Run on Linear-Population Tree network Same topology both cases Fixed size graph Vary physical tree size –Smaller trees  more serial Many “messages” local to cluster, no routing –Large trees  more parallel

Penn ESE534 Spring DeHon 27 Spreading Activation Start with few nodes active Propagate changes along edges

Penn ESE534 Spring DeHon 28 Butterfly Fat Trees (BFTs) Familiar from Day 17, 18, HW8 Similar phenomena with other topologies Directional version

Penn ESE534 Spring DeHon 29 Iso-PEs [Kapre et al., FCCM 200]

Penn ESE534 Spring DeHon 30 Iso-PEs PS vs. TM ratio at same PE counts –Small number of PEs little difference Dominated by serialization (self-messages) Not stressing the network –Larger PE counts TM ~60% better TM uses global congestion knowledge while scheduling

Penn ESE534 Spring DeHon 31 Preclass 3 Logic for muxsel ? Logic for Arouted? Logic for Brouted? Gates? Gate Delay?

Penn ESE534 Spring DeHon 32 Issue 2: Switch Complexity Requires area/delay/energy to make decisions Also requires storage area Avoids instruction memory High cost of switch complexity may diminish benefit.

Penn ESE534 Spring DeHon 33 Mesh Congest Preclass 2 ring similar to slice through mesh A,B – corner turns May not be able to route on a cycle

Penn ESE534 Spring DeHon 34 Mesh Congest What happens when inputs from 2 (or 3) sides want to travel out same output?

Penn ESE534 Spring DeHon 35 FIFO Buffering Store inputs that must wait until path available –Typically store in FIFO buffer How big do we make the FIFO?

Penn ESE534 Spring DeHon 36 PS Hardware Primitives T-Switch π -Switch Red arrows backpressure when FIFOs are full

Penn ESE534 Spring DeHon 37 FIFO Buffer Full? What happens when FIFO fills up?

Penn ESE534 Spring DeHon 38 FIFO Buffer Full? What happens when FIFO fills up?

Penn ESE534 Spring DeHon 39 FIFO Buffer Full? What happens when FIFO fills up?

Penn ESE534 Spring DeHon 40 FIFO Buffer Full? What happens when FIFO fills up? Maybe backup network Prevent other routes from using –If not careful, can create deadlock

Penn ESE534 Spring DeHon 41 TM Hardware Primitives

Penn ESE534 Spring DeHon 42 Area in PS/TM Switches Packet (32 wide, 16 deep) –3split + 3 merge –Split ctrl, 33 fifo buffer –Merge ctrl, 66 fifo buffer –Total: 244 Time Multiplexed (16b) –9+(contexts/16) –E.g. 41 at 1024 contexts Both use SRL16s for memory (16b/4-LUT) Area in FPGA slice counts

Penn ESE534 Spring DeHon 43 Area Effects Based on FPGA overlay model i.e. build PS or TM on top of FPGA

Penn ESE534 Spring DeHon 44 Preclass 4 Gates in static design: 8 Gates in dynamic design: 8+? Which energy best? –P d =1 –P d =0.1 –P d =0.5

Penn ESE534 Spring DeHon 45 PS vs TM Switches PS switches can be larger/slower/more energy Larger: –May compete with PEs for area on limited capacity chip

Penn ESE534 Spring DeHon 46 Assessment Following from Kapre et al. / FCCM 2006

Penn ESE534 Spring DeHon 47 Analysis PS v/s TM for same area –Understand area tradeoffs (PEs v/s Interconnect) PS v/s TM for dynamic traffic –PS routes limited traffic, TM has to route all traffic

Penn ESE534 Spring DeHon 48 Area Analysis Evaluate PS and TM for multiple BFTs –Tradeoff Logic Area for Interconnect –Fixed Area of 130K slices p=0, BFT => 128 PS PEs => 1476 cycles p=0.5, BFT => 64 PS PEs => 943 cycles Extract best topologies for PS and TM at each area point –BFT of different p best at different area points Compare performance achieved at these bests at each area point

Penn ESE534 Spring DeHon 49 PS Iso-Area: Topology Selection

Penn ESE534 Spring DeHon 50 TM Iso-Area

Penn ESE534 Spring DeHon 51 Iso-Area

Penn ESE534 Spring DeHon 52 Iso-Area Ratio

Penn ESE534 Spring DeHon 53 Iso-Area Iso-PEs = TM 1~2x better With Area –PS 2x better at small areas –TM 4-5x better at large areas –PS catches up at the end

Penn ESE534 Spring DeHon 54 Activity Factors Activity = Fraction of traffic to be routed TM needs to route all PS can route fraction Variable activity queries in ConceptNet –Simple queries ~1% edges –Complex queries ~40% edges

Penn ESE534 Spring DeHon 55 Activity Factors

Penn ESE534 Spring DeHon 56 Crossover could be less

Penn ESE534 Spring DeHon 57 Lessons Latency –PS could achieve same clock rate –But took more cycles –Didn’t matter for this workload Quality of Route –PS could be 60% worse Area –PS larger, despite all the TM instrs –Big factor –Will be “technology” and PE-type dependent Depends on relative ratio of PE to switches Depends on relative ratio of memory and switching

Penn ESE534 Spring DeHon 58 Big Ideas [MSB Ideas] Communication often data dependent When unpredictable, unlikely to use potential connection –May be more efficient to share dynamically Dynamic may cost more per communication –More logic, buffer area, more latency –Less efficient due to local view Net win if sufficiently unpredictable

Penn ESE534 Spring DeHon 59 Admin FM1 marked – see feedback Wednesday –Last class –FM2 due –End of discussion period Reading on Canvas