ESE534: Computer Organization

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 24: April 21, 2010 Interconnect 6: Dynamically Switched Interconnect.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 22: April 4, 2007 Interconnect 7: Time Multiplexed Interconnect.
Issues in System-Level Direct Networks Jason D. Bakos.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Computer Networks with Internet Technology William Stallings
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 12, 2010 Retiming.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 Muhammad Waseem Iqbal Lecture # 20 Data Communication.
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
Delay-Tolerant Networks (DTNs)
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE534: Computer Organization
Dynamic connection system
Lecture 23: Interconnection Networks
ESE532: System-on-a-Chip Architecture
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
Azeddien M. Sllame, Amani Hasan Abdelkader
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Mechanics of Flow Control
Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.
ESE534: Computer Organization
Switching, routing, and flow control in interconnection networks
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Day 26: November 1, 2013 Synchronous Circuits
ESE534: Computer Organization
On-time Network On-chip
Switching Techniques.
CS184a: Computer Architecture (Structures and Organization)
CS184a: Computer Architecture (Structures and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
HopliteBuf: FPGA NoCs with Provably Stall-Free FIFOs
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Presentation transcript:

ESE534: Computer Organization Day 27: December 7, 2016 Interconnect 7: Dynamically Switched Interconnect Penn ESE534 Fall 2016 -- DeHon

Previously Configured Interconnect Multicontext Interconnect Lock down route between source and sink Multicontext Interconnect Switched cycle-by-cycle from Instr. Mem. Interconnect Topology Data-dependent control for computation Penn ESE534 Fall 2016 -- DeHon

Today Data-dependent control/sharing of interconnect Dynamic Sharing (Packet Switching) Motivation Formulation Design Assessment NoC (Network-on-a-Chip) Most contemporary uses mean Packet Switching Penn ESE534 Fall 2016 -- DeHon

Motivation Penn ESE534 Fall 2016 -- DeHon

Consider: Preclass 2 How might this be inefficient with configured interconnect? Penn ESE534 Fall 2016 -- DeHon

Consider: Preclass 2 How might this be more efficient? What might be inefficient about it? Penn ESE534 Fall 2016 -- DeHon

Consider: Preclass 2 How might this be more efficient? What undesirable behavior might this have? Penn ESE534 Fall 2016 -- DeHon

Opportunity Interconnect major area, energy, delay Instantaneous communication << potential communication Question: can we reduce interconnect requirements by only routing instantaneous communication needs? Penn ESE534 Fall 2016 -- DeHon

Examples Data dependent, unpredictable results Slowly changing values Search filter, compression Slowly changing values Surveillance, converging values Penn ESE534 Fall 2016 -- DeHon

Formulation Penn ESE534 Fall 2016 -- DeHon

Dynamic Sharing Don’t reserve resources Request as needed Hold a resource for a single source/sink pair Allocate cycles for a particular route Request as needed Share amongst potential users Penn ESE534 Fall 2016 -- DeHon

Bus Example Time Multiplexed version Dynamic version Allocate time slot on bus for each communication Dynamic version Arbitrate for bus on each cycle Penn ESE534 Fall 2016 -- DeHon

Bus Example Time Multiplexed version Dynamic version Allocate time slot on bus for each communication Dynamic version Arbitrate for bus on each cycle Penn ESE534 Fall 2016 -- DeHon

Dynamic Bus Example 4 PEs Potentially each send out result on change Value only changes with probability 0.1 on each “cycle” TM: Slot for each PE0 PE1 PE2 PE3 PE0 PE1 PE2 PE3 Dynamic: arbitrate based on need None PE0 none PE1 PE1 none PE3 …. TM either runs slower (4 cycles/compute) or needs 4 busses Dynamic single bus seldom bottleneck Probability two need bus on a cycle? Penn ESE534 Fall 2016 -- DeHon

Network Example Time Multiplexed Dynamic As assumed so far in class Memory says how to set switches on each cycle Dynamic Attach address or route designation Switches forward data toward destination (did see for Data Local case) Penn ESE534 Fall 2016 -- DeHon

Butterfly Log stages Resolve one bit per stage bit3 bit2 bit1 bit0 16 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 16 Penn ESE534 Spring 2010 -- DeHon Penn ESE534 Fall 2016 -- DeHon

Tree Route Downpath resolves one bit per stage 17 Penn ESE534 Spring 2010 -- DeHon Penn ESE534 Fall 2016 -- DeHon

Mesh Route Destination (dx,dy) Current location (cx,cy) Route up/down left/right based on (dx-cx,dy-cy) 18 ESE534 -- Spring 2010 -- DeHon Penn ESE534 Fall 2016 -- DeHon

Prospect Use less interconnect <OR> get higher throughput across fixed interconnect By Dynamically sharing limited interconnect Bottleneck on instantaneous data needs Not worst-case data needs Penn ESE534 Fall 2016 -- DeHon

Design Penn ESE534 Fall 2016 -- DeHon

Directional Ring Ring: 1D Mesh Penn ESE534 Fall 2016 -- DeHon

How like a bus? Consider sending PE4PE1 on each Penn ESE534 Fall 2016 -- DeHon

Advantage over bus? Consider sending 21, 43 Penn ESE534 Fall 2016 -- DeHon

Preclass 1 Cycles from simulation? Max delay of single link? Best case possible? Penn ESE534 Fall 2016 -- DeHon

Assessment: Local vs. Global Penn ESE534 Fall 2016 -- DeHon

Issue 1: Local online vs Global Offline Dynamic must make local decision Often lower quality than offline, global decision Static, Time-multiplexed Can figure out best global decision offline Quality of decision will reduce potential benefits of reducing instantaneous requirements Penn ESE534 Fall 2016 -- DeHon

Experiment Send-on-Change for spreading activation task Run on Linear-Population Tree network Same topology both cases Fixed size graph Vary physical tree size Smaller trees  more serial Many “messages” local to cluster, no routing Large trees  more parallel Penn ESE534 Fall 2016 -- DeHon

Spreading Activation Start with few nodes active Propagate changes along edges Penn ESE534 Fall 2016 -- DeHon

Butterfly Fat Trees (BFTs) Familiar from Day 17, 18, HW8 Similar phenomena with other topologies Directional version Penn ESE534 Fall 2016 -- DeHon

Iso-PEs [Kapre et al., FCCM 200] Penn ESE534 Fall 2016 -- DeHon

Iso-PEs PS vs. TM ratio at same PE counts Small number of PEs little difference Dominated by serialization (self-messages) Not stressing the network Larger PE counts TM ~60% better TM uses global congestion knowledge while scheduling Penn ESE534 Fall 2016 -- DeHon

Assessment: Switch Complexity Penn ESE534 Fall 2016 -- DeHon

Preclass 3 Logic for muxsel<0>? Logic for Arouted? Logic for Brouted? Gates? Gate Delay? Penn ESE534 Fall 2016 -- DeHon

Issue 2: Switch Complexity Requires area/delay/energy to make decisions Also requires storage area Avoids instruction memory High cost of switch complexity may diminish benefit. Penn ESE534 Fall 2016 -- DeHon

Mesh Congest Preclass 1 ring similar to slice through mesh A,B – corner turns May not be able to route on a cycle Penn ESE534 Fall 2016 -- DeHon

Mesh Congest What happens when inputs from 2 (or 3) sides want to travel out same output? Penn ESE534 Fall 2016 -- DeHon

Congestion Store inputs that must wait until path available Typically store in FIFO buffer How big do we make the FIFO? Misroute: (deflection routing) Send in to an available (wrong) direction Requires balance of ins and outs Can make work on mesh How much more traffic do we create misrouting? Penn ESE534 Fall 2016 -- DeHon

PS Hardware Primitives T-Switch π -Switch Red arrows backpressure when FIFOs are full Penn ESE534 Fall 2016 -- DeHon

FIFO Buffer Full? What happens when FIFO fills up? Penn ESE534 Fall 2016 -- DeHon

FIFO Buffer Full? What happens when FIFO fills up? Penn ESE534 Fall 2016 -- DeHon

FIFO Buffer Full? What happens when FIFO fills up? Penn ESE534 Fall 2016 -- DeHon

FIFO Buffer Full? What happens when FIFO fills up? Maybe backup network Prevent other routes from using If not careful, can create deadlock Penn ESE534 Fall 2016 -- DeHon

TM Hardware Primitives Penn ESE534 Fall 2016 -- DeHon

Area Tree T switch in PS/TM Switches Packet (32 wide, 16 deep) 3split + 3 merge Split 79 30 ctrl, 33 fifo buffer Merge 165 60 ctrl, 66 fifo buffer Total: 244 Time Multiplexed (16b) 9+(contexts/16) E.g. 41 at 1024 contexts Both use SRL16s for memory (16b/4-LUT) Area in FPGA slice counts Penn ESE534 Fall 2016 -- DeHon

Area Effects Based on FPGA overlay model i.e. build PS or TM on top of FPGA Penn ESE534 Fall 2016 -- DeHon

Bidirectional vs. Directional [Kapre+Gray, FPL 2015] Penn ESE534 Fall 2016 -- DeHon

Bidirectional vs. Directional [Kapre+Gray, FPL 2015] Penn ESE534 Fall 2016 -- DeHon

Mesh Packet Switched 32b Split-Merge FIFO Hoplite Deflection bidrectional 1800 LUTs Hoplite Deflection undirectional 60 LUTs [Kapre+Gray, FPL 2015] Penn ESE534 Fall 2016 -- DeHon

Mesh Area Deflection PS/TM [Kapre FCCM 2015] Penn ESE534 Fall 2016 -- DeHon

Assessment: Local-Global/Switch-Complexity Penn ESE534 Fall 2016 -- DeHon

Deflection Route What concerns might we have about deflection route? Penn ESE534 Fall 2016 -- DeHon

Latency vs. Traffic [Kapre+Gray, FPL 2015] Penn ESE534 Fall 2016 -- DeHon

Take 2, they are small [Kapre+Gray, FPL 2015] Penn ESE534 Fall 2016 -- DeHon

Importance of Global Perspective Routing 142K message add20 benchmark Marathon statically schedule PS [Kapre FCCM 2015] Penn ESE534 Fall 2016 -- DeHon

Observations Wires dominate energy Packet Switch Global scheduling Can use less interconnect when traffic light and variables Global scheduling Can be more efficient when traffic heavy Can do with time-multiplexed or packet-switched substrate Wires dominate energy Packet Switch adds header bits that get routed over network wires Amortized out for larger packet Wpacket architecture mismatch effect Penn ESE534 Fall 2016 -- DeHon

Big Ideas [MSB Ideas] Communication often data dependent When unpredictable, unlikely to use potential connection May be more efficient to share dynamically Dynamic may cost more per communication More logic, buffer area, more latency Packet headers give up locality of instruction Less efficient due to local view Net win if sufficiently unpredictable Penn ESE534 Fall 2016 -- DeHon

Admin FM2 due today Monday Reading on Canvas Last class End of discussion period Reading on Canvas Penn ESE534 Fall 2016 -- DeHon