CS/EE 5810 CS/EE 6810 F00: 1 Multimedia
CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in computer architecture and microprocessor design” “… new media-rich applications … involve significant real-time processing of continuous media streams and make heavy use of vectors of packed 8-, 16-, and 32-bit integer and f.p.” –“How Multimedia Workloads will Change Processor Design,” Diefendorff & Dubey, IEEE Computer (9/97) Needs includes high memory bandwidth, high network bandwidth, continuous media data types, real-time response, fine-grain parallelism Also significant focus on system bus performance –Common bridge to the memory system and I/O –Critical performance component for SMP server platforms
CS/EE 5810 CS/EE 6810 F00: 3 Multimedia Workloads Multimedia –Video conferencing –Video authoring –Animation –Games Algorithms –Image compression (jpeg) –Video Compression (mpeg) –3-D graphics –encryption
CS/EE 5810 CS/EE 6810 F00: 4 Multimedia Characteristics Real-time response –Video, audio Continuous media data types –8-16 bits sufficient for many applications Data parallelism –E.g. share same operation to whole image –Vector or SIMD work well here Coarse-grained parallelism –E.g. video encoding/decoding, audio encoding/decoding Small loops –Most time spent in kernal –Amenable to hand-optimization High memory bandwidth –Video, 3d graphics –Caches not large enough
CS/EE 5810 CS/EE 6810 F00: 5 Multimedia ISA Extensions HP PA-RISC –MAX-2 SUN SPARC –VIS Intel x86 –MMX MIPS –MDMX PowerPC –Altivec
CS/EE 5810 CS/EE 6810 F00: 6 MMX “MMX Technology Extension to the Intel Architecture” Alex Peleg and Uri Weiser, IEEE Micro, August 1996 Goals –Improve performance of multimedia applications »Graphics, MPEG video »Image processing, speech recognition –Remain completely compatible with Intel x86 ISA –Minimize cost Approach –Use packed data types –Exploit SIMD parallelism –Make use of existing wide data paths
CS/EE 5810 CS/EE 6810 F00: 7 Data Types and Operands Three fixed-point integer types packed into 64 bit quad word –Packed Byte: 8 8-bit bytes –Packed Word: 4 16-bit words –Packed Doubleword: 2 32-bit words User-controlled fixed point Eight 64-bit GP registers (mm0-mm7) MMX shares FPU –Can’t do FP an MMX at the same time Random Access –Learned lesson from FP unit design.
CS/EE 5810 CS/EE 6810 F00: 8 MMX Operations 57 MMX instructions work on all data types Support for saturation arithmetic –Simplifies handling of underflow and overflow –Matches physical behavior Packed operations –Addition/subtraction, multiplication, compares, shifts Conversion operations –Pack/unpack Performance improvement –Fewer loads and stores –Fewer arithmetic operations, but more conversion
CS/EE 5810 CS/EE 6810 F00: 9 MMX Operations A3 A2 A1 A0 B3 B2 B1 B0 X X A3 X B3 A2 X B2 A1 X B1 A0 X B0 A3XB3 + A2XB2 Packed multiply-add To doubleword > > 00…0 11…100…011…1 Packed compare Greater-than word
CS/EE 5810 CS/EE 6810 F00: 10 Using MMX Assembly language coding Use of libraries –E.g. IDCT, DCT, matrix multiply… Use of C macros (“intrinsics”) –Generate optimized assembly code –Performs register allocation and instruction scheduling »MMX64 t0, t1; t0 = padd(t0, t1); –Requires intimate knowledge of MMX Could a compiler generate MMX code?
CS/EE 5810 CS/EE 6810 F00: 11 Chroma Keying Weatherman example »For (I = 0; I < imagesize; I++) new_image = (x[I] == blue) ? Y[I] : X[I]; –Movqmm3, mem1; load 8 pixels from weatherman movqmm4, mem2; load 8 pixels from map Pcmpeqmm1, mm3; generate select mask pandmm4, mm1; AND map with mask pandnmm1, mm3; AND weatherman with inverse mask pormm4, mm1; OR masked images together