Download presentation
Presentation is loading. Please wait.
1
Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Eng. M. Soliman Prof. S. Sedukhin
2
2 Contents The impacting factors on the processor architecture The idea of our proposed Trident processor The Trident parallelism The architecture of the Trident processor The features of the Trident processor Conclusion and Future work
3
3 Technology Applications Characteristics processor Architecture The Important Factors Impact on the Processor Architecture
4
4 Fast-improving Technology Moose's law: The number of transistors per integrated circuit would double every 18 months
5
5 Application Characteristics ProcessorMultimedia extension Intel Pentium II, III, and 4MMX, SSE, and SSE2 Motorola PowerPCAltiVec Silicon Graphics MIPSMDMX Sun SparcVIS Hewlett-Packard PA-RISCMAX In response to the increasing importance of multimedia applications, major processor vendors have announced extensions to their general purpose processors in an effort to improve their multimedia performance
6
6 The Idea of the Trident Processor The huge transistor budget (within a few years it will be possible to integrate a billion transistors on a single chip ) The requirements of future applications (the scientific and engineering applications, multimedia applications, …, are based on vector and matrix operations)
7
7 Scalar IS (1 operation) Vector IS (n operations) Matrix IS (n 2 /n 3 operations) We Propose the Trident Processor Trident: A general-purpose processor which has three instruction sets (IS): scalar, vector, and matrix
8
8 Ins. SetExampleScalar CodeScalar ops ScalarAddition z=x+y; 1 Vector Addition for(i=0;i<n;i++) z[i]=x[i]+y[i]; O(n)O(n) Dot product s=0; for(i=0;i<n;i++) s+=x[i]*y[i]; O(n)O(n) Addition for(i=0;i<n;i++) for(j=0;j<n;j++) z[i][j]=x[i][j]+ y[i][j]; O(n2)O(n2) Matrix Matrix-vector multiplication for(i=0;i<n;i++){ s=0; for(j=0;j<n;j++) s+=x[i][j]*y[j];z[i]=s;} O(n2)O(n2) Matrix-matrix multiplication for(i=0;i<n;i++) for(j=0;j<n;j++){ s=0; for(k=0;k<n;k++) s+=x[i][k]*y[k][j];z[i][j] =s;} O(n3)O(n3) The Trident Instruction sets
9
9 Trident processor exploits a significant amount (up to three levels) of data parallelism The advantages of using data parallelism Compact: A single short instruction can describe array of scalar operations Expressive: A single instruction can pass valuable information about an array of scalar operations to hardware Scalable: adding more hardware can increase performance by processing longer arrays The Trident Parallelism
10
10 The Trident Architecture
11
11 Vector Processing A vector pipeline can perform the fundamental vector operation, such as addition, subtraction, multiplication, and division Vector data are stored on ring vector registers Multiple vector instructions can be operated concurrently on the parallel vector pipelines
12
12 Step 0 Input a 0, b 0 Output 1 a !, b 1 a 0 + b 0 2 a 3, b 3 a 1 + b 1 3 a 3, b 3 a 2 + b 2 4 a 0, b 0 a 3 + b 3 VR2 VR0 + VR1 Example: vector addition
13
13 Matrix Processing By using parallel vector pipelines and ring matrix register file, the fundamental matrix operations, such as addition, subtraction, multiplication, and inversion, can be performed
14
14 Example: Matrix addition MR2 MR0 + MR1 P3P3 P2P2 P1P1 P0P0 P3P3 P2P2 P1P1 P0P0 OutputInput Step 0 a 00 b 00 a 10 b 10 a 20 b 20 a 30 b 30 1 a 01 b 01 a 11 b 11 a 21 b 21 a 31 b 31 a 00 + b 00 a 10 + b 10 a 20 + b 20 a 30 + b 30 2 a 02 b 02 a 12 b 12 a 22 b 22 a 32 b 32 a 01 + b 01 a 11 + b 11 a 21 + b 21 a 31 + b 31 3 a 03 b 03 a 13 b 13 a 23 b 23 a 33 b 33 a 02 + b 02 a 12 + b 12 a 22 + b 22 a 32 + b 32
15
15 The basic matrix operation is the matrix-matrix multiplication Matrix-matrix Multiplication
16
16 Chaining Matrix-matrix multiplication Matrix-vector multiplication Dot product
17
17 Instructions O(n3)O(n3)O(n2)O(n2)O(1) Load O(n3)O(n3)O(n3)O(n3)O(n2)O(n2) Store O(n2)O(n2)O(n2)O(n2)O(n2)O(n2) Mull-acc. O(n3)O(n3)O(n3)O(n3)O(n2)O(n2) Branch O(n3)O(n3)O(n2)O(n2)0 Address comp. O(n3)O(n3)O(n2)O(n2)O(1) Add/sub. O(n3)O(n3)O(n2)O(n2)0 Reg. initialization O(n2)O(n2)O(n)O(n)0 Scalar ISVector ISMatrix IS Matrix-matrix Multiplication Complexity
18
18 scalarvector matrix 8 8 Matrix-matrix Multiplication Number of instructions
19
19 scalar vector matrix (1) load, (2) store, (3) multiply-accumulate steps, (4) branch, (5) address computations, (6) addition/ subtraction, and (7) register initializations Continue
20
20 What this means? fewer instruction cache misses, fewer instruction fetches and decodes, fewer branches and fewer mispredicted branches, more predictable memory accesses, fewer hazards We can say that Trident code is compact code with powerful instructions for high performance
21
21 The Trident Processor Features The Trident processor consists mainly of datapath circuitry and register files The advances in the VLSI fabrication technology can be directly applied to support more parallelism Simple control unit There are many applications benefit from executing on the Trident processor, such as scientific, engineering, multimedia, and many others
22
22 Future Work Simulating the Trident processor Evaluating the performance of Trident processor on some multimedia and numerical applications Comparing the performance of Trident processor with the superscalar processors
23
23 Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.