Intel MMX™ Technology Accelerating 3D Geometry Transformation

Slides:



Advertisements
Similar presentations
Transforming graphs of functions
Advertisements

Philip Willis Projective Alpha Colour Media Technology Research Centre, University of Bath.
2D Geometric Transformations
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
COMP 2003: Assembly Language and Digital Logic
ECE291 Computer Engineering II Lecture 24 Josh Potts University of Illinois at Urbana- Champaign.
Implementation of the Convolution Operation on General Purpose Processors Ernest Jamro AGH Technical University Kraków, Poland.
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
CBP Comp 1017 Digital Technologies1 Let’s make a Computer … at least the CPU … Pentium 4 Pentium 3 Opteron Ultra Sparc Itanium 2 McKinley.
Picture Manipulation The manipulation of (already created) pictures. May be applied to vector graphics or bitmaps. We will consider bitmaps and introduce.
Lecture 4: 3D Rendering Pipeline (I) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology.
Elementary 3D Transformations - a "Graphics Engine" Transformation procedures Transformations of coordinate systems Translation Scaling Rotation.
Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Gheewala, A.; Peir, J.-K.; Yen-Kuang Chen; Lai, K.; IEEE.
High Performance Computing Introduction to classes of computing SISD MISD SIMD MIMD Conclusion.
Implementing a FIR-filter algorithm using MMX instructions by Lars Persson.
Software Performance Tuning Project Monkey’s Audio Prepared by: Meni Orenbach Roman Kaplan Advisors: Liat Atsmon Kobi Gottlieb.
1 Computer Graphics Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD3107 University of Palestine.
Intel SIMD architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2008/1/5.
FROM CONCRETE TO ABSTRACT Basic Skills Analysis Hypothesis Proof Elementary Matrices and Geometrical Transformations for Linear Algebra Helena Mirtova.
Timing Trials An investigation arising out of the Assignment CS32310 – Nov 2013 H Holstein 1.
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
Arithmetic Flags and Instructions
December 2, 2015Single-Instruction Multiple Data (SIMD)1 Performance Optimization, cont. How do we fix performance problems?
Introduction to MMX, XMM, SSE and SSE2 Technology
Intel SIMD architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25.
Jens Krüger & Polina Kondratieva – Computer Graphics and Visualization Group computer graphics & visualization 3D Rendering Praktikum: Shader Gallery The.
WARM UP: Describe in words how to rotate a figure 90 degrees clockwise.
Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu.
Sahar Mosleh California State University San MarcosPage 1 Assembly language and Digital Circuit By Sahar Mosleh California State University San Marcos.
MMX-accelerated Matrix Multiplication
Honors Geometry.  We learned how to set up a polygon / vertex matrix  We learned how to add matrices  We learned how to multiply matrices.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
ICS51 Introductory Computer Organization Accessing parameters from the stack and calling functions.
Instructor: Dr. Shereen Aly Taie Basic Two-Dimensional Geometric Transformation 5.2 Matrix Representations and Homogeneous Coordinates 5.3 Inverse.
Modeling Transformations
Optimizing Pixomatic For Modern Processors
Assembly language.
Math Fundamentals Maths revisit.
Spatcial Description & Transformation
3D Geometric Transformation
Data Transfers, Addressing, and Arithmetic
Intel SIMD architecture
Assembly IA-32.
3D Graphics Rendering PPT By Ricardo Veguilla.
Instruction Scheduling for Instruction-Level Parallelism
Data-Related Operators and Directives
Transformation of Beam forming Algorithm Using MMX Instructions
Chapters 5/4 part2 understanding transformations working with matrices
Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures
STUDY AND IMPLEMENTATION
Intel SIMD architecture
CET 3510 – Lecture 11 Bit Manipulation in a High-Level Programming Language Dr. José M. Reyes Álamo.
The Modelview Matrix Lecture 8 Mon, Sep 10, 2007.
Image Coding and Compression
UNIVERSITY OF MASSACHUSETTS Dept
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
Transformation Operators
Matrix Addition
Image manipulation via matrices
UNIVERSITY OF MASSACHUSETTS Dept
Introduction to Mathematical tools in used in DIP
Simplification of Articulated Mesh
Other Processors Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.
Transformations with Matrices
Warm Up 6.3 Complete the chart for the given counterclockwise rotations about the origin. 90o 180o 270o R(3, -3) S(1, 4) Use.
Presentation transcript:

Intel MMX™ Technology Accelerating 3D Geometry Transformation Implementing & Accelerating 3D Geometry Transformations with MMX™ Technology Pei Qi & Yang Wang Electrical&Computer Engineering University of Wisconsin-Madison May, 2nd, 2006 ECE734 VLSI Array Structure for Digital Signal Processing Instructed by: Professor Hu

Implementation&Optimization 1. Identify the critical part of code Intel MMX™ Technology Accelerating 3D Geometry Transformation Background Motivation Implementation&Optimization 1. Identify the critical part of code 2. Reversing (Renaming) Register to find more paired instructions 3. Re-ordering instructions to break the dependency chains Simulation 1. Correctness 2. Performance improvement (Execution time) Conclusion

What is 3D Geometry Transformation? Intel MMX™ Technology Accelerating 3D Geometry Transformation What is 3D Geometry Transformation? X Translation Rotation Shrink/Expands Y Z

Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation Representation of 3D object Matrix multiplication of vectors X,Y,Z : Coordinates W: Perspective Corrective Information X Y Z W Original vertex Transformation matrix Transformed vertex Translation Rotation Scaling

Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation Each matrix-vector multiplication amounts to a series of vector-vector multiplications, each of which is a series of scalar multiplies and adds: - Requires 16 multiplies and 12 adds for each vertex pixel - Lots of inherent parallelism

Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation PMADDWD instruction – Packed Multiply and Add a3 a2 a1 a0 * * * * b3 b2 b1 b0 a3*b3+a2*b2 a1*b1+a0*b0 One PMADDWD instruction per row in the matrix, such that reduce the previous workload to 4 multiplies and 2 adds for each of vertex pixel

Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation MOVQ instruction – Move 64 bits PSRLQ – Packed Shift Right Logical PSRAD – Packed Shift Right Arithmetic

Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation mov eax,[esp]+ 4 mov ebx,[esp]+ 8 3 mov ecx,[esp]+12 4 mov edx,[esp]+16 5 movq mm0, 0[eax] 6 NextVect: 7 movq mm3,[ebx] 8 movq mm4,mm3 9 pmaddwd mm4,mm0 10 movq mm5,mm4 11 psrlq mm5,32 12 paddd mm4,mm5 13 moved [edx],mm4 16bits a0 a1 a2 a3 X Y Z 1 a0*X + a1*Y a2*Z + a3*1 (a0*X + a1*Y) + (a2*Z + a3*1)

Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation NextVect: 1 movq mm3, [ebx] 2 movq mm4, mm3 3 pmaddwd mm4, mm0 4 movq mm5, mm4 5 psrlq mm5, 32 6 paddd mm4, mm5 7 psrad mm4, 13 8 movd [edx], mm4 9 movq mm4, mm3 * 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm5, 32 13 paddd mm4, mm5 14 psrad mm4, 13 15 movd [edx+2], mm4 16 movq mm4, mm3 * 17 pmaddwd mm4,mm2 18 movq mm5, mm4 19 psrlq mm5, 32 20 paddd mm4, mm5 21 movd [edx+4], mm4 22 add ebx, 8 23 add edx, 6 * 24 dec ecx 1 25 jnz NextVect * NextVect: 1 movq mm4, [ebx] 2 movq mm3, mm4 3 pmaddwd mm4, mm0 * 4 movq mm5, mm4 5 psrlq mm4, 32 * 6 paddd mm5, mm4 7 psrad mm5, 13 8 movd [edx], mm5 9 movq mm4, mm3 * 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm4, 32 * 13 paddd mm5, mm4 14 psrad mm5, 13 15 movd [edx+2], mm5 16 movq mm4, mm3 * 17 pmaddwd mm4, mm2 18 movq mm5, mm4 19 psrlq mm4, 32 * 20 paddd mm5, mm4 21 movd [edx+4], mm5 22 add ebx, 8 23 add edx, 6 * 24 dec ecx 1 25 jnz NextVect * Reversing register to get more paired instructions

Re-ordering instructions to break the dependency chains Intel MMX™ Technology Accelerating 3D Geometry Transformation NextVect: 1 movq mm4, [ebx] 2 movq mm3, mm4 3 pmaddwd mm4, mm0 1 - 1 4 movq mm5, mm4 5 psrlq mm4, 32 1 - 1 6 paddd mm5, mm4 8 movd [edx], mm5 9 movq mm4, mm3 1 - 1 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm4, 32 1 - 1 13 paddd mm5, mm4 15 movd [edx+2], mm5 16 movq mm4, mm3 1 - 1 17 pmaddwd mm4, mm2 18 movq mm5, mm4 19 psrlq mm4, 32 1 - 1 20 paddd mm5, mm4 21 movd [edx+4], mm5 22 add ebx, 8 23 add edx, 6 1 - 1 24 dec ecx 1 25 jnz NextVect 1 - 1 NextVect: 1 movq mm3, [ebx] 2 movq mm4, mm3 3 movq mm5, mm4 4 pmaddwd mm3, mm0 1- 1 5 pmaddwd mm4, mm1 6 pmaddwd mm5, mm2 7 movq mm6, mm3 2 - 1 8 psrlq mm3, 32 9 paddd mm3, mm6 10 movq mm6, mm4 1 - 1 11 psrlq mm4, 32 12 paddd mm4, mm6 13 movq mm6, mm5 1 - 1 14 psrlq mm5, 32 15 paddd mm5, mm6 19 movd [edx], mm3 20 movd [edx+2], mm4 21 movd [edx+4], mm5 22 add edx, 6 23 add ebx, 8 1 - 1 24 dec ecx 1 25 jnz NextVect 1 - 1 Re-ordering instructions to break the dependency chains

1. Correctness (first concern) Intel MMX™ Technology Accelerating 3D Geometry Transformation Simulation 1. Correctness (first concern) To testify if the optimized code yields the correct result, we compare output of optimized code with the result calculated by directly multiplies two matrix in Matlab. We found that the result of optimized code is coincident with what we expected. This proved that the optimization would not influence the correctness of program.

2. Performance Improvement (Execution time) Intel MMX™ Technology Accelerating 3D Geometry Transformation Simulation 2. Performance Improvement (Execution time) Test Case Execution Time (sec) Un-Optimized Optimized Vertices number: 320 0.644 0.6118 Vertices number: 3200 5.6943 5.601 Vertices number: 32000 78.0654 77.0027

Intel MMX™ Technology Accelerating 3D Geometry Transformation Conclusion The new instructions in MMX™ Technology can effectively accelerate 3D geometry transformation. In addition, we can improve the performance of execution through some optimizations at instruction level, such as reversing register, re-ordering instruction without sacrificing the correctness of code.

Thank you Accelerating 3D Geometry Transformation Intel MMX™ Technology Accelerating 3D Geometry Transformation Thank you ECE734 VLSI Array Structure for Digital Signal Processing Instructed by: Professor Hu