December 5, 2006http://csg.csail.mit.edu/6.827/L22-1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.

Slides:



Advertisements
Similar presentations
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 5 CS252 Graduate Computer Architecture Spring 2014 Lecture 5: Out-of-Order Processing Krste Asanovic.
Advertisements

Lecture 3: Instruction Set Principles Kai Bu
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Instruction-Level Parallelism (ILP)
DATAFLOW ARHITEKTURE. Dataflow Processors - Motivation In basic processor pipelining hazards limit performance –Structural hazards –Data hazards due to.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.3 Program Flow Mechanisms.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Where do We Go from Here? Thoughts on Computer Architecture TIP For additional advice see Dale Canegie Training® Presentation Guidelines Jack Dennis MIT.
Processor Structure & Operations of an Accumulator Machine
1 DATAFLOW ARCHITECTURES JURIJ SILC, BORUT ROBIC, THEO UNGERER.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
CMPE 511 DATA FLOW MACHINES Mustafa Emre ERKOÇ 11/12/2003.
High Performance Architectures Dataflow Part 3. 2 Dataflow Processors Recall from Basic Processor Pipelining: Hazards limit performance  Structural hazards.
Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.
1 Arvind Computer Science and Artificial Intelligence Lab M.I.T. Dataflow: Passing the Token ISCA 2005, Madison, WI June 6, 2005.
March 4, 2008 DF4-1http://csg.csail.mit.edu/arvind/ From Id/pH to Dataflow Graphs Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
Copyright © 2005 Elsevier Chapter 8 :: Subroutines and Control Abstraction Programming Language Pragmatics Michael L. Scott.
1 Multithreaded Architectures Lecture 3 of 4 Supercomputing ’93 Tutorial Friday, November 19, 1993 Portland, Oregon Rishiyur S. Nikhil Digital Equipment.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
1 Chapter 2 Dataflow Processors. 2 Dataflow processors Recall from basic processor pipelining: Hazards limit performance. – Structural hazards – Data.
March 4, 2008 DF5-1http://csg.csail.mit.edu/arvind/ The Monsoon Project Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.
Different parallel processing architectures Module 2.
Chapter 7 Dataflow Architecture. 7.1 Dataflow Models  A dataflow program: the sequence of operations is not specified, but depends upon the need and.
Lecture 04: Instruction Set Principles Kai Bu
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
March 4, 2008http://csg.csail.mit.edu/arvindDF3-1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Computer Architecture Dataflow Machines Sunset over Lifou, New Caledonia.
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University.
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
Processes and threads.
Dataflow Machines CMPE 511
Instruction Level Parallelism
Prof. Onur Mutlu Carnegie Mellon University
Fall 2012 Parallel Computer Architecture Lecture 22: Dataflow I
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Dynamic Pipeline
Chapter 4 The Von Neumann Model
Advantages of Dynamic Scheduling
Computer Programming Machine and Assembly.
Dataflow Models.
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Krste Asanovic Electrical Engineering and Computer Sciences
Adapted from the slides of Prof
Dataflow: Passing the Token
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Adapted from the slides of Prof
Topic B (Cont’d) Dataflow Model of Computation
15-740/ Computer Architecture Lecture 10: Out-of-Order Execution
CS510 Operating System Foundations
Dataflow Model of Computation (From Dataflow to Multithreading)
How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.
Lecture 4: Instruction Set Design/Pipelining
Well-behaved Dataflow Graphs
Lecture 5: Pipeline Wrap-up, Static ILP
Samira Khan University of Virginia Jan 23, 2019
Advanced Computer Architecture Dataflow Processing
Prof. Onur Mutlu Carnegie Mellon University
Prof. Onur Mutlu Carnegie Mellon University
Chapter 4 The Von Neumann Model
Presentation transcript:

December 5, 2006http://csg.csail.mit.edu/6.827/L22-1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

December 5, 2006 L22-2http://csg.csail.mit.edu/6.827/ Static - mostly for signal processing NEC - NEDIP and IPP Hughes, Hitachi, AT&T, Loral, TI, Sanyo M.I.T. Engineering model... Dynamic Manchester (‘81) M.I.T. - TTDA, Monsoon (‘88) M.I.T./Motorola - Monsoon (‘91) (8 PEs, 8 IS) ETL - SIGMA-1 (‘88) (128 PEs,128 IS) ETL - EM4 (‘90) (80 PEs), EM-X (‘96) (80 PEs) Sandia - EPS88, EPS-2 IBM - Empire... Dataflow Machines Related machines: Burton Smith’s Denelcor HEP, Horizon, Tera Shown at Supercomputing 96 Shown at Supercomputing 91 S. Sakai Y. Kodama EM4: single-chip dataflow micro, 80PE multiprocessor, ETL, Japan K. Hiraki T. Shimada Sigma-1: The largest dataflow machine, ETL, Japan John Gurd Greg Papadopoulos Monsoon Andy Boughton Chris Joerg Jack Costanza

December 5, 2006 L22-3http://csg.csail.mit.edu/6.827/ Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance

December 5, 2006 L22-4http://csg.csail.mit.edu/6.827/ Dataflow Graphs {x = a + b; y = b * 7 in (x-y) * (x+y)} a b +*7 -+ * y x Values in dataflow graphs are represented as tokens An operator executes when all its input tokens are present; copies of the result token are distributed to the destination operators token instruction ptr portdata ip = 3 p = L no separate control flow

December 5, 2006 L22-5http://csg.csail.mit.edu/6.827/ Static Dataflow Machine: Instruction Templates Each arc in the graph has a operand slot in the program a b +*7 -+ * y x Destination 1 Destination 2 Presence bits L 4L * 3R 4R - 5L + 5R * out Opcode Operand 1Operand 2

December 5, 2006 L22-6http://csg.csail.mit.edu/6.827/ Static Dataflow Machine Jack Dennis, 1973, Many such processors can be connected together Programs can be statically divided among the processor Receive Send FU Op dest1 dest2 p1 src1 p2 src2 Instruction Templates

December 5, 2006 L22-7http://csg.csail.mit.edu/6.827/ Static Dataflow: Problems/Limitations Mismatch between the model and the implementation The model requires unbounded FIFO token queues per arc but the architecture provides storage for one token per arc The architecture does not ensure FIFO order in the reuse of an operand slot The merge operator has a unique firing rule The static model does not support Function calls Data Structures - No easy solution in the static framework - Dynamic dataflow provided a framework for solutions

December 5, 2006 L22-8http://csg.csail.mit.edu/6.827/ Outline Static Dataflow Machines  Not general-purpose enough Dynamic Dataflow Machines  As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance

December 5, 2006 L22-9http://csg.csail.mit.edu/6.827/ Dynamic Dataflow Architectures Allocate instruction templates, i.e., a frame, dynamically to support each loop iteration and procedure call termination detection needed to deallocate frames The code can be shared if we separate the code and the operand storage frame pointer instruction pointer a token

December 5, 2006 L22-10http://csg.csail.mit.edu/6.827/ A Frame in Dynamic Dataflow Program * L, 4L 3R, 4R 5L 5R out * a b +*7 -+ * y x Need to provide storage for only one operand/operator Frame

December 5, 2006 L22-11http://csg.csail.mit.edu/6.827/ Monsoon Processor Greg Papadopoulos Instruction Fetch Operand Fetch ip fp+r Network Frames op rd1,d2 Code Form Token ALU Token Queue

December 5, 2006 L22-12http://csg.csail.mit.edu/6.827/ Temporary Registers & Threads Robert Iannucci Registers evaporate when an instruction thread is broken n sets of registers (n = pipeline depth) Instruction Fetch Operand Fetch Network Frames op rS1,S2 Code Form Token ALU Token Queue Registers Registers are also used for exceptions & interrupts Robert Iannucci

December 5, 2006 L22-13http://csg.csail.mit.edu/6.827/ Actual Monsoon Pipeline: Eight Stages Instruction Fetch Form Token System Queue Registers Network Instruction Memory Effective Address Presence Bit Operation Frame Operation Presence bits Frame Memory User Queue R, 2W ALU

December 5, 2006 L22-14http://csg.csail.mit.edu/6.827/ Instructions directly control the pipeline The opcode specifies an operation for each pipeline stage: opcode r dest1 [dest2] EA - effective address FP + r: frame relative r: absolute IP + r: code relative (not supported) WM - waiting matching Unary; Normal; Sticky; Exchange; Imperative PBs X port  PBs X Frame op X ALU inhibit Register ops: ALU: V L X V R  V’ L X V’ R, CC Form token: V L X V R X Tag 1 X Tag 2 X CC  Token 1 X Token 2 EA WM RegOp ALU FormToken Easy to implement; no hazard detection

December 5, 2006 L22-15http://csg.csail.mit.edu/6.827/ Procedure Linkage Operators f get frame extract tag change Tag 0 Graph for f change Tag 1 a1 1: change Tag n an n:... change Tag 1 Fork token in frame 0 token in frame 1 Like standard call/return but caller & callee can be active simultaneously

December 5, 2006 L22-16http://csg.csail.mit.edu/6.827/ Outline Static Dataflow Machines  Not general-purpose enough Dynamic Dataflow Machines  As easy to build as a simple pipelined processor The software view  The memory model: I-structures Monsoon and its performance

December 5, 2006 L22-17http://csg.csail.mit.edu/6.827/ Parallel Language Model Global Heap of Shared Objects Tree of Activation Frames h:g: f: loop active threads asynchronous and parallel at all levels

December 5, 2006 L22-18http://csg.csail.mit.edu/6.827/ Dataflow Graphs + I-Structures +... TTDA Monsoon *T *T-Voyager Id World implicit parallelism Id

December 5, 2006 L22-19http://csg.csail.mit.edu/6.827/ Id World people Rishiyur Nikhil, Keshav Pingali, Vinod Kathail, David Culler Ken Traub Steve Heller, Richard Soley, Dinart Mores Jamey Hicks, Alex Caro, Andy Shaw, Boon Ang Shail Anditya R Paul Johnson Paul Barth Jan Maessen Christine Flood Jonathan Young Derek Chiou Arun Iyangar Zena Ariola Mike Bekerle K. Eknadham (IBM), Wim Bohm (Colorado), Joe Stoy (Oxford),... Steve Heller Ken TraubR.S. Nikhil Keshav Pingali David Culler Boon S. AngDerek ChiouJamey Hicks

December 5, 2006 L22-20http://csg.csail.mit.edu/6.827/ Data Structures in Dataflow.. P P Memory Data structures reside in a structure store  tokens carry pointers I-structures: Write-once, Read multiple times or allocate, write, read,..., read, deallocate  No problem if a reader arrives before the writer at the memory location I-fetch a I-store a v

December 5, 2006 L22-21http://csg.csail.mit.edu/6.827/ I-Structure Storage: Split-phase operations & Presence bits I-Fetch t s a a a a v2v2 fp.ip I-structure Memory Need to deal with multiple deferred reads other operations: fetch/store, take/put, clear  v1v1 address to be read t I-Fetch s split phase forwarding address

December 5, 2006http://csg.csail.mit.edu/6.827/L22-22 Next time Compiling Id/pH into dataflow graphs Monsoon Performance