Download presentation
Presentation is loading. Please wait.
Published byΣεραφείμ Κυπραίος Modified over 5 years ago
1
Intel MMX™ Technology Accelerating 3D Geometry Transformation Implementing & Accelerating 3D Geometry Transformations with MMX™ Technology Pei Qi & Yang Wang Electrical&Computer Engineering University of Wisconsin-Madison May, 2nd, 2006 ECE734 VLSI Array Structure for Digital Signal Processing Instructed by: Professor Hu
2
Implementation&Optimization 1. Identify the critical part of code
Intel MMX™ Technology Accelerating 3D Geometry Transformation Background Motivation Implementation&Optimization 1. Identify the critical part of code 2. Reversing (Renaming) Register to find more paired instructions 3. Re-ordering instructions to break the dependency chains Simulation 1. Correctness 2. Performance improvement (Execution time) Conclusion
3
What is 3D Geometry Transformation?
Intel MMX™ Technology Accelerating 3D Geometry Transformation What is 3D Geometry Transformation? X Translation Rotation Shrink/Expands Y Z
4
Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation Representation of 3D object Matrix multiplication of vectors X,Y,Z : Coordinates W: Perspective Corrective Information X Y Z W Original vertex Transformation matrix Transformed vertex Translation Rotation Scaling
5
Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation Each matrix-vector multiplication amounts to a series of vector-vector multiplications, each of which is a series of scalar multiplies and adds: - Requires 16 multiplies and 12 adds for each vertex pixel - Lots of inherent parallelism
6
Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation PMADDWD instruction – Packed Multiply and Add a3 a2 a1 a0 * * * * b3 b2 b1 b0 a3*b3+a2*b2 a1*b1+a0*b0 One PMADDWD instruction per row in the matrix, such that reduce the previous workload to 4 multiplies and 2 adds for each of vertex pixel
7
Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation MOVQ instruction – Move 64 bits PSRLQ – Packed Shift Right Logical PSRAD – Packed Shift Right Arithmetic
8
Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation mov eax,[esp]+ 4 mov ebx,[esp]+ 8 mov ecx,[esp]+12 mov edx,[esp]+16 movq mm0, 0[eax] NextVect: movq mm3,[ebx] movq mm4,mm3 pmaddwd mm4,mm0 10 movq mm5,mm4 11 psrlq mm5,32 12 paddd mm4,mm5 13 moved [edx],mm4 16bits a0 a1 a2 a3 X Y Z 1 a0*X + a1*Y a2*Z + a3*1 (a0*X + a1*Y) + (a2*Z + a3*1)
9
Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation NextVect: 1 movq mm3, [ebx] 2 movq mm4, mm3 3 pmaddwd mm4, mm0 4 movq mm5, mm4 5 psrlq mm5, 32 6 paddd mm4, mm5 7 psrad mm4, 13 8 movd [edx], mm4 9 movq mm4, mm3 * 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm5, 32 13 paddd mm4, mm5 14 psrad mm4, 13 15 movd [edx+2], mm4 16 movq mm4, mm3 * 17 pmaddwd mm4,mm2 18 movq mm5, mm4 19 psrlq mm5, 32 20 paddd mm4, mm5 21 movd [edx+4], mm4 22 add ebx, 8 23 add edx, 6 * 24 dec ecx 1 25 jnz NextVect * NextVect: 1 movq mm4, [ebx] 2 movq mm3, mm4 3 pmaddwd mm4, mm0 * 4 movq mm5, mm4 5 psrlq mm4, 32 * 6 paddd mm5, mm4 7 psrad mm5, 13 8 movd [edx], mm5 9 movq mm4, mm3 * 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm4, 32 * 13 paddd mm5, mm4 14 psrad mm5, 13 15 movd [edx+2], mm5 16 movq mm4, mm3 * 17 pmaddwd mm4, mm2 18 movq mm5, mm4 19 psrlq mm4, 32 * 20 paddd mm5, mm4 21 movd [edx+4], mm5 22 add ebx, 8 23 add edx, 6 * 24 dec ecx 1 25 jnz NextVect * Reversing register to get more paired instructions
10
Re-ordering instructions to break the dependency chains
Intel MMX™ Technology Accelerating 3D Geometry Transformation NextVect: 1 movq mm4, [ebx] 2 movq mm3, mm4 3 pmaddwd mm4, mm 4 movq mm5, mm4 5 psrlq mm4, 6 paddd mm5, mm4 8 movd [edx], mm5 9 movq mm4, mm 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm4, 13 paddd mm5, mm4 15 movd [edx+2], mm5 16 movq mm4, mm 17 pmaddwd mm4, mm2 18 movq mm5, mm4 19 psrlq mm4, 20 paddd mm5, mm4 21 movd [edx+4], mm5 22 add ebx, 8 23 add edx, 24 dec ecx 1 25 jnz NextVect 1 - 1 NextVect: 1 movq mm3, [ebx] 2 movq mm4, mm3 3 movq mm5, mm4 4 pmaddwd mm3, mm0 1- 1 5 pmaddwd mm4, mm1 6 pmaddwd mm5, mm2 7 movq mm6, mm 8 psrlq mm3, 32 9 paddd mm3, mm6 10 movq mm6, mm 11 psrlq mm4, 32 12 paddd mm4, mm6 13 movq mm6, mm 14 psrlq mm5, 32 15 paddd mm5, mm6 19 movd [edx], mm3 20 movd [edx+2], mm4 21 movd [edx+4], mm5 22 add edx, 6 23 add ebx, 24 dec ecx 1 25 jnz NextVect 1 - 1 Re-ordering instructions to break the dependency chains
11
1. Correctness (first concern)
Intel MMX™ Technology Accelerating 3D Geometry Transformation Simulation 1. Correctness (first concern) To testify if the optimized code yields the correct result, we compare output of optimized code with the result calculated by directly multiplies two matrix in Matlab. We found that the result of optimized code is coincident with what we expected. This proved that the optimization would not influence the correctness of program.
12
2. Performance Improvement (Execution time)
Intel MMX™ Technology Accelerating 3D Geometry Transformation Simulation 2. Performance Improvement (Execution time) Test Case Execution Time (sec) Un-Optimized Optimized Vertices number: 320 0.644 0.6118 Vertices number: 3200 5.6943 5.601 Vertices number: 32000
13
Intel MMX™ Technology Accelerating 3D Geometry Transformation Conclusion The new instructions in MMX™ Technology can effectively accelerate 3D geometry transformation. In addition, we can improve the performance of execution through some optimizations at instruction level, such as reversing register, re-ordering instruction without sacrificing the correctness of code.
14
Thank you Accelerating 3D Geometry Transformation
Intel MMX™ Technology Accelerating 3D Geometry Transformation Thank you ECE734 VLSI Array Structure for Digital Signal Processing Instructed by: Professor Hu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.