Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Similar presentations


Presentation on theme: "Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence."— Presentation transcript:

1 Computer Architecture Dataflow Machines

2 Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence specifies control which instruction the CPU will execute next Execution rule: Execute an instruction when its predecessor has completed s1: r = a*b; s2: s = c*d; s3: y = r + s; s2 executes when s1 is complete s3 executes when s2 is complete

3 Data Flow Consider the calculation y = a*b + c*d Represent it by a graph Nodes represent computations Data flows along arcs Execution rule: Execute an instruction when its data is available Data driven rule ab x + dc x y

4 Data Flow Dataflow firing rule An instruction fires (executes) when its data is available Exposes all possible parallelism Either multiplication can fire as soon as data arrives Addition must wait Data dependence analysis! Instruction issue units: Fire (issue) each instruction when its operands (registers) have been written ab x + dc x y

5 Data Flow - Realisations Several Experimental Machines built ManchesterGurd & Watson Tagged TokenArvind, MIT SigmaETL, Tsukuba EMC-4ETL, Tsukuba MonsoonArvind, MIT EMXETL, Tsukuba RAPIDOsaka/Sharp/Mitsubishi (Asynchronous!) NaiadTasmania and some others

6 Data Flow - Realisations Manchester

7 Data Flow - Program Program word Matching Store Entry When both Presence Flags are Y, this packet is despatched to a PE (any PE!) Operation +, -, *, / etc Left, Right Operands Presence Flags Destination Address Destination Left or Right

8 Data Flow - Matching Store Special purpose memory Limited processing capability Detects full slots Despatches operation packets to any idle PE Operation +, -, *, / etc Left, Right Operands Presence Flags Destination Address Destination Left or Right

9 Data Flow - Processing Elements Receive operation packets Generate result Form result packet Despatch to matching store

10 Data Flow - EM4 Architects Yamaguchi, Sakai, Kodama, Sato et al ElectroTechnical Laboratory, Tsukuba, Japan PE (EM-Y) CMOS Gate Array 80k gates / 1.0  f = 20MHz ~1992

11 Data Flow - Monsoon Architects Papadopoulos, Culler et al MIT, Cambridge PE f = 10MHz ~1990 I-Structure Processor

12 Data Flow - I-Structures Memory with a presence bit Tag each memory location with a bit indicating its validity Valid bit set -> normal read (no wait) Data not yet written (valid bit not set) çWait çRead requests queued òData driven execution Operations proceed when data is available valid data validdata

13 Data Flow - Monsoon Pipeline 8 stage pipeline “Presence bits” checks operand availability Frame (coarse grain) basis

14 Data Flow - Summary Fine-Grain Dataflow Suffered from comms network overload! Coarse-Grain Dataflow Monsoon... Overtaken by commercial technology!! A sad “fact-of-life” It’s almost impossible to generate the funds for non-”mainstream” computer architecture research $n x 10 8 required  Non-mainstream = interesting!

15 Data Flow - Summary As a software model … Functional languages Dataflow in a different guise! Theoretically important Practically? Inefficient ( = slow!!) ….. Ask your CS colleagues! Cilk - based on C Used on CIIPS Myrmidons Uses a dataflow model Threads become ready for execution when their data is generated Message passing efficiency Without explicit data transfer & synchronisation!

16 Networks Network Topology (or shape) Vital to efficient parallel algorithms Communication is the limiting factor! Ideal Cross-bar Any-to-any Non-blocking Except two sources to same receiver Realisable But only for limited order (number of ports)

17 Networks Cross-bars Achilles 8 x 8 Full duplex Simultaneous Input and Output at each port 32 bit data-path Target : 1Gbyte / second total throughput but we needed the 3-D arrangement to achieve bandwidth high order

18 Networks Cross-bars Achilles Hardware almost trivial! Single FPGA on each level Programmable VHDL Models Several topologies Just by changing the software!

19 Networks - More than 8 PEs Simple Use 2 8x8 routers! but …. This link gets a lot of traffic!

20 Networks - Fat tree Problem: High-traffic links between PEs can become a bottleneck Solution: Fat-tree Links higher up the tree are “fatter” Sustainable bandwidth between all PEs is the same

21 Networks - Performance Metrics Metrics for comparing network topologies Diameter Maximum distance between any pair of nodes Determines latency Bisection Bandwidth Aggregate bandwidth over any “cut” which divides the network in half Determines throughput Crossbar Diameter: 1 Every PE is directly connected to router so a single “hop” suffices Bisection Bandwidth: b bytes/sec b is the bandwidth of a single link

22 Networks - Performance Metrics Metrics for comparing network topologies To connect n PEs with mxm crossbars Single link bandwidth b bytes/s Simple: n = 14 (2 switches) Diameter3 Bisection Bandwidth b 1 2 3

23 Networks - Performance Metrics Fat-tree Diameter: 2 log m n Height is log m n Worst case distance - up and down Bisection Bandwidth: b n/2 bytes/sec Links are fatter higher up the tree log m n

24 Networks - Performance Metrics Mesh Diameter: 2  n-2 Bisection Bandwidth: b  n bytes/sec Order: 4

25 Networks - Performance Metrics Hypercube Hypercube of order m Link 2 order m-1 hypercubes with 2 m-1 links Number of PEs: n = 2 m Order: log 2 n = m Order 2 Hypercube Order 3 Hypercube

26 Networks - Hypercubes Embedding property In an n PE hypercube, we have hypercubes of size n/2, n/4, … Number PEs with binary numbers 000, 001, 010, 011, 100, … Joining two hypercubes add one binary digit to the numbering Each PE is connected to every PE whose index differs in only one bit

27 Networks - Hypercubes Embedding property Partitioning tasks Allocate to sub-cubes Sub-tasks allocated to sub-cubes of that cube, etc


Download ppt "Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence."

Similar presentations


Ads by Google