Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu.

Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu

Outline Introduction  MMX/SSE/SSE2  MPEG 2 Video Compression What we have done? Conclusion

MMX/SSE/SSE2 MMX  57 new instructions;  8 64-bit wide MMX registers;  4 new data types. (3 packed data type and 1 64-bit entity) SSE  8 new 128-bit SIMD floating-point registers;  50 new instructions that work on packed floating-point data;  8 new instructions to control data cacheability;  12 new instructions that extend the MMX instruction set. SSE2  Support 64-bit floating-point values

MPEG 2 video compression

Project outline 1.Dig out a MPEG2 Enc/Dec C code 2.Generate profiling information 5.Performance results 4.Rewrite kernels using SSE 3.Identify the kernels

Profiling results of the original code mpeg2decodempeg2encode idct() dist1() fdct()

Example 1 – optimizing dist1() if ((v = p1[0] - p2[0])<0) v = -v; s+= v; if ((v = p1[1] - p2[1])<0) v = -v; s+= v; if ((v = p1[2] - p2[2])<0) v = -v; s+= v; if ((v = p1[3] - p2[3])<0) v = -v; s+= v; if ((v = p1[4] - p2[4])<0) v = -v; s+= v; if ((v = p1[5] - p2[5])<0) v = -v; s+= v; if ((v = p1[6] - p2[6])<0) v = -v; s+= v; if ((v = p1[7] - p2[7])<0) v = -v; s+= v; if ((v = p1[8] - p2[8])<0) v = -v; s+= v; if ((v = p1[9] - p2[9])<0) v = -v; s+= v; if ((v = p1[10] - p2[10])<0) v = -v; s+= v; if ((v = p1[11] - p2[11])<0) v = -v; s+= v; if ((v = p1[12] - p2[12])<0) v = -v; s+= v; if ((v = p1[13] - p2[13])<0) v = -v; s+= v; if ((v = p1[14] - p2[14])<0) v = -v; s+= v; if ((v = p1[15] - p2[15])<0) v = -v; s+= v; asm volatile (" movdqu (%1), %XMM0 movdqu (%2), %XMM1 psadbw %XMM0, %XMM1 movdq2q %XMM1, %MM0 pslldq $8, %XMM1 movdq2q %XMM1, %MM1 paddd %MM1, %MM0 movd %MM0, %0" : "=r"(s) : "r"(p1), "r"(p2)); This code segment is for calculating residual matrices in the prediction stage in Encoder 4-5X speed-up, but it can be faster!

Four ways to write super-fast code Rearrange data fetching to maximize cache hit; Unroll loops to eliminate unnecessary branches; Utilize SSE instructions to take full advantage of parallelism; Apply code scheduling to exploit multiple issue capability of Pentium 4's superscalar micro- architecture.

Example 2 – optimize idct() for (i=0; i<8; i++) for (j=0; j<8; j++) { partial_product = 0.0; for (k=0; k<8; k++) partial_product+= c[k][j]*block[i][k]; tmp[i][j] = partial_product; } Three nested loops forms the kernel of DCT:

A verbatim translation from C to assembly doesn’t do much better. It misses the whole point of manually writing an assembly procedure.

We need parallelism!

Results 50.1s 16.34s 2.45s3.83s 68.72% 13.04% 34.39% 9.99% Experimental Results are averaged over 3 runs. 25X in idct() 4X in dist1()

Platform Compatibility (1) Algorithm for Checking Availability of MMX bool isMMXSupported() { int fSupported; asm { mov eax,1 // CPUID level 1 cpuid // EDX = feature flag and edx,0x800000 // test bit 23 of feature flag mov fSupported,edx // != 0 if MMX is supported } if (fSupported != 0) return true; else return false; }

Platform Compatibility (2) Algorithm for Checking Availability of SSE bool isISSESupported() { int processor; int features; int extfeatures = 0; asm{ pusha mov eax,1 cpuid mov processor,eax // Store processor family/model/step mov features,edx // Store features bits mov eax,080000000h cpuid // Check which extended functions can be called cmp eax,080000001h // Extended Feature Bits jb nofeatures // Jump if not supported mov eax,080000001h // Select function 0x80000001 cpuid mov extfeatures,edx // Store extended features bits nofeatures: popa } if (((features $>>$ 25) \& 1) != 0) return true; else if (((extfeatures $>>$ 22) \& 1) != 0) return true; else return false; } N SSE? MMX? Normal Routine SSE Routine MMX Routine END N Y Y

Thank you!

Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu.

Similar presentations

Presentation on theme: "Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu.

Similar presentations

Presentation on theme: "Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu."— Presentation transcript:

Similar presentations

About project

Feedback