(Page 554 – 564) Ping Perez CS 147 Summer 2001
Alternative Parallel Architectures Dataflow Systolic arrays Neural networks
To understand how data flow computers work, it is first necessary to understand dataflow graphs. As a computer program is compiled, it is converted into its equivalent dataflow graph, which shows the data dependencies between statements and is used by the dataflow computer to generate the structures it needs to execute the program.
A code segment and its dataflow graph A B + C 2.D E + F 3.G A + H 4. I D + G 5.J I + K B C E F H K
As shown in the figure, each vertex of the graph corresponds to the operator performed by one of the instructions. The directed edges going to a vertex correspond to the operands of the function performed by the vertex, and the directed edge leaving the vertex represents the result generated by the function. A B + C D E + F G A + H I D + G J I + K B C E F H K
Single Assignment Rule This code segment has four violations of the single assignment rule, starting with statement 2. The value stored by this statement, B, was used as an operand in statement 1, so it must be renamed. We can rename it B1, and change all references to it later in this code. Similarly, values C and D, set by statements 3 and 4, are also used as operands in prior statements and must be renamed. 1. A B + C 2. B A + D 3. C A + B 4. D C + B 5. A A + C
Single Assignment Rule (con’t) Finally, statement 5 stores its result in A, the same variable used to store the result in statement1, we must also change this variable’s name. Note that statement 2, 3 and 5 all use A as an operand: This is not a violation of the single assignment rule. An operand can be used many times. 1. A B + C 2. B A + D 3. C A + B 4. D C + B 5. A A + C
1.A B + C 2.B1 A + D 3.C1 A + B1 4.D1 C1 + B1 5.A1 A + C1 B C D
Single Assignment Rule The data flow graph describes the dependencies between statements and how data will flow between statements. An edge, however, does not show when data flows from one statement to another. The data that traverses an edge is called a token. When a token is available, it is represented as a dot on the edge.
A vertex is ready to fire, or execute its instruction when all edges have tokens, or the instruction’s operands are all available. B C D
I - Structures Within the computer system, dataflow vertices are usually stored as I-structures. Each I- structure includes the operation to be performed, its operands, and a list of destinations for its result.
An I-structure and the dataflow graph with I-structure + 2 ( ) { 2 / 1 } { 2/1, 3/1,4/2} + ( ) 4 {3/2,4/2} + ( ) ( ) {4/1,5/2} + ( ) ( ) -
The architectures of dataflow system 1. Static architectures 2. Dynamic architectures
Static dataflow computer organization This figure shows the organization of the static dataflow computer. The I- store unit has two sections. The memory section stores the I-structures of the dataflow program. I-store unit Processors Firing queue Memory section Update/Ready/section
What is Systolic Arrays? Systolic array incorporates several processing elements into a regular structure, such as linear array or mesh. Each processing element performs a single, fixed function, and communicates only with its neighboring processing elements.
A 2 X 2 systolic array to multiply two matrices U L 1,1 R D U L 1,2 R D U L 2,1 R D U L 2,2 R D
During the first clock cycle we input A 1, 1 to input L and B 1, 1 to input U of processing element 1,1. This processing element calculates A 1, 1 B 1, 1 and adds it to its running total and running time remain 0. A1,1 0 B1,1 0 Total= A1,1 B1,1 Total= 0 Total= 0 Total= 0
During the second clock cycle, we input A 1,2 to L and B 2, 1 to U, this processing element multiplies them and adds to product to its running total, which becomes A 1, 1 B 1, 1 + A 1, 1 B 2, 1, the finial value of C 1,1. A1,2 A2,1 B2,1 B1,2 Total= A1,1 B1,1 + A1,2B2,1 Total= A2,1B1,1 Total= A1,1B1,2 Total= 0 A1,1
Clock cycle 3 continues the matrix multiplication. Since C 1,1 has already been calculated, we input 0 to the inputs of processing element 1,1 so the running total is not changed. The final values of C 1,2 and C 2,1 are calculated during this clock cycle and first part of C 2,2 is generated. 0 0 B 2,2 Total= A1,1 B1,1 + A1,1B2,1 Total= A1,1B1,1 + A2,2B2,1 Total= A1,1B1,2 + A1,2B2,2 Total= A2,1B1,2 A 2,2 B 2,1 B 1,2 A 2,1 A 1,2
The final value of C 2,2 is calculated during clock cycle 4, as shown in the figure, at this point, multiplication of the two matrices has been computed. Total= A1,1 B1,1 + A1,1B2,1 Total= A2,1B1,1 + A2,2B2,1 Total= A1,1B1,2 + A1,2B2,2 Total= A2,1B1,2+ A2,2B2,2 B 2,2 A 2,2
1)Neural network are different from any other computing structure. 2)They incorporate thousands or millions of simple processing elements called neurons. 3)They have far less processing power than CPU.
Unlike traditional computer, which are programmed, neural networks are trained. Training consists of defining system input data and defining the desired system outputs for that input data.
System outputs are generated as a function as a function of the outputs of individual neurons. Each neuron’s output, in turn is a function of the outputs of the neurons to which it is connected. The output of each neuron is multiplied by its weighting factor. All of these weighted values are added together. ( 1 )
This value is compared to the threshold value for that neuron. If the weighted value is greater than or equal to the threshold value, the neural output value is 1, otherwise it’s output is 0. (2)
1 Label weight weight value Label Weight Value Input 1* * * * 0.4 Value =0.7 > 0.65 (N’s threshold value) Since this weighted value 0.7 is greater than the threshold Value, neuron N outputs a logical value of 1
Where is a neural network be used? A neural network is not appropriate for general purpose computing, you won’t find a neural network running windows on a personal computer. Instead it has found applications in tasks that do not run well on conventional architectures. Neural networks are also being used in control systems and artificial intelligence applications.