Prof. Onur Mutlu Carnegie Mellon University

Slides:

Advertisements

Similar presentations

DATAFLOW ARHITEKTURE. Dataflow Processors - Motivation In basic processor pipelining hazards limit performance –Structural hazards –Data hazards due to.

Advertisements

Processes CSCI 444/544 Operating Systems Fall 2008.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.3 Program Flow Mechanisms.

(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

CMPE 511 DATA FLOW MACHINES Mustafa Emre ERKOÇ 11/12/2003.

CS321 Functional Programming 2 © JAS Implementation using the Data Flow Approach In a conventional control flow system a program is a set of operations.

High Performance Architectures Dataflow Part 3. 2 Dataflow Processors Recall from Basic Processor Pipelining: Hazards limit performance  Structural hazards.

1 Multithreaded Architectures Lecture 3 of 4 Supercomputing ’93 Tutorial Friday, November 19, 1993 Portland, Oregon Rishiyur S. Nikhil Digital Equipment.

Drew Freer, Beayna Grigorian, Collin Lambert, Alfonso Roman, Brian Soumakian.

Different parallel processing architectures Module 2.

Chapter 7 Dataflow Architecture. 7.1 Dataflow Models  A dataflow program: the sequence of operations is not specified, but depends upon the need and.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.

Computer Architecture: Out-of-Order Execution

Department of Computer Science and Software Engineering

Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.

Computer Architecture: Out-of-Order Execution II

March 4, 2008http://csg.csail.mit.edu/arvindDF3-1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

December 5, 2006http://csg.csail.mit.edu/6.827/L22-1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.

Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.

Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University.

Chapter Overview General Concepts IA-32 Processor Architecture

Processes and threads.

Computer Architecture Lecture 2: Fundamental Concepts and ISA

Dataflow Machines CMPE 511

15-740/ Computer Architecture Lecture 3: Performance

Advanced Operating Systems CIS 720

Process Management Process Concept Why only the global variables?

Precise Exceptions and Out-of-Order Execution

William Stallings Computer Organization and Architecture 8th Edition

Operating Systems (CS 340 D)

Prof. Onur Mutlu Carnegie Mellon University

Lecture Topics: 11/1 Processes Process Management

Fall 2012 Parallel Computer Architecture Lecture 22: Dataflow I

Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/16/2015

CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions

Intro to Processes CSSE 332 Operating Systems

CS203 – Advanced Computer Architecture

Chapter 3 Top Level View of Computer Function and Interconnection

Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Operating Systems (CS 340 D)

Computer Architecture: Multithreading (I)

Single Thread Parallelism

Single Thread Parallelism

Computer Architecture Dataflow (Part II) and Systolic Arrays

15-740/ Computer Architecture Lecture 24: Control Flow

15-740/ Computer Architecture Lecture 5: Precise Exceptions

Process & its States Lecture 5.

Operating Systems.

Samira Khan University of Virginia Sep 5, 2018

Lecture Topics: 11/1 General Operating System Concepts Processes

Processes Hank Levy 1.

Lecture 3: Evolution of the Uniprocessor

Processes and Process Management

Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

Topic B (Cont’d) Dataflow Model of Computation

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

Chapter 12 Pipelining and RISC

CS510 Operating System Foundations

Computer System Overview

Processes Hank Levy 1.

William Stallings Computer Organization and Architecture 7th Edition

Samira Khan University of Virginia Jan 23, 2019

Advanced Computer Architecture Dataflow Processing

Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/11/2015

Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 3/19/2012

Prof. Onur Mutlu Carnegie Mellon University

Presentation transcript:

Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 22: Data Flow II Prof. Onur Mutlu Carnegie Mellon University

Data Flow

Review: Data Flow In a data flow machine, a program consists of data flow nodes A data flow node fires (fetched and executed) when all its inputs are ready i.e. when all inputs have tokens Data flow node and its ISA representation

Review: Static Dataflow Allows only one instance of a node to be enabled for firing A dataflow node is fired only when all of the tokens are available on its input arcs and no tokens exist on any of its its output arcs Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974.

Review: Static versus Dynamic Dataflow Machines

Review: Dynamic Dataflow Architectures Allocate instruction templates, i.e., a frame, dynamically to support each loop iteration and procedure call termination detection needed to deallocate frames The code can be shared if we separate the code and the operand storage a token <fp, ip, port, data> frame pointer instruction pointer

Review: Loops and Function Calls Summary

Review: Control of Parallelism Problem: Many loop iterations can be present in the machine at any given time 100K iterations on a 256 processor machine can swamp the machine (thrashing in token matching units) Not enough bits to represent frame id Solution: Throttle loops. Control how many loop iterations can be in the machine at the same time. Requires changes to loop dataflow graph to inhibit token generation when number of iterations is greater than N

Data Structures Dataflow by nature has write-once semantics Each arc (token) represents a data value An arc (token) gets transformed by a dataflow node into a new token (token)  No persistent state… Data structures as we know of them (in imperative languages) are structures with persistent state Why do we want persistent state? More natural representation for some tasks? (bank accounts, databases, …) To exploit locality

Data Structures in Dataflow Data structures reside in a structure store Þ tokens carry pointers I-structures: Write-once, Read multiple times or allocate, write, read, ..., read, deallocate Þ No problem if a reader arrives before the writer at the memory location . . . . P Memory I-fetch a I-store a v

I-Structures

Dynamic Data Structures Write-multiple-times data structures How can you support them in a dataflow machine? Can you implement a linked list? What are the ordering semantics for writes and reads? Functional vs. imperative languages Side effects and state vs. no side effects and no state

MIT Tagged Token Data Flow Architecture Resource Manager Nodes responsible for Function allocation (allocation of context identifiers), Heap allocation, etc.

MIT Tagged Token Data Flow Architecture Wait−Match Unit: try to match incoming token and context id and a waiting token with same instruction address Success: Both tokens forwarded Fail: Incoming token −−> Waiting Token Mem, bubble (no-op forwarded)

TTDA Data Flow Example

TTDA Data Flow Example

TTDA Data Flow Example

TTDA Synchronization Heap memory locations have FULL/EMPTY bits if the heap location is EMPTY, heap memory module queues request at that location When "I−Fetch" request arrives (instead of "Fetch"), Later, when "I−Store" arrives, pending requests are discharged No busy waiting No extra messages

Manchester Data Flow Machine Matching Store: Pairs together tokens destined for the same instruction Large data set  overflow in overflow unit Paired tokens fetch the appropriate instruction from the node store

Data Flow Characteristics Data-driven execution of instruction-level graphical code Nodes are operators Arcs are data (I/O) As opposed to control-driven execution Only real dependencies constrain processing No sequential I-stream No program counter Operations execute asynchronously Execution triggered by the presence of data Single assignment languages and functional programming E.g., SISAL in Manchester Data Flow Computer No mutable state

Data Flow Advantages/Disadvantages Very good at exploiting irregular parallelism Only real dependencies constrain processing Disadvantages Interrupt/exception handling is difficult Context switching overhead and precise state High bookkeeping overhead (tag matching, data storage) Debugging difficult (no precise state) Too much parallelism? (Parallelism control needed) Overflow of the node tables Implementing dynamic data structures difficult Instruction cycle is inefficient (delay between dependent instructions)

Data Flow Summary Availability of data determines order of execution A data flow node fires when its sources are ready Programs represented as data flow graphs (of nodes) Data Flow at the ISA level has not been (as) successful Data Flow implementations under the hood (while preserving sequential ISA semantics) have been successful Out of order execution Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986.

Combining Data Flow and Control Flow Can we get the best of both worlds? Two possibilities Keep control flow at the ISA level, do dataflow underneath, preserving sequential semantics Keep dataflow at the ISA level, do limited control flow underneath Incorporate threads into dataflow: statically ordered instructions; when the first instruction is fired, the remaining instructions execute without interruption