HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center
2 3D EU Kernel Media Engine URBURB Media (Video Front End) Command Streamer Thread Spawner Thread Dispatcher Indirect data Thread payload Video memory Data portSampler
3 Mode of operation Coded data Output pixel MC IDCTVLDISIQ VFE or host EU Kernels
4 Current XVMC implementation coded data Output pixel MC IDCT VLDIS IQ Host Software per slice data per macroblock data EU Kernels
5 XVMC XVMC lib Media Application DRI interface X Server Graphic Hardware render, sync, resource management mpeg stream decode slice of macro blocks media commands, video memory management
6 Video Memory Layout command stream VFE state Interface descriptors media surface EU kernel Instruction media object command selected interface media pointer command media surface surface state binding tables flush command
7 Execute Unit introduction SIMD code (variable execute size up to 16) with prediction and control mask. Float and integer data type Region based direct and indirect register addressing Support scalar and immediate source operand
8 EU Registers GRF (General Register File) –256 bits per register (g0, g1, g2, gxx) MRF (Message Register File) –256 bits per register (m0, m1, m2, mx), write only, –Used to pass payload from thread to shared function unit. ARF (Architecture Register File) –e.g null, ip and flag register Immediate –encoded in instruction
9 Register Region g0 (256 bits) Width=8 VertStride=16 HorzStride=2 Type=w g5.2 w g15.3 UB origin regnum=5, subregnum=2 Regnum.Subregnum Type
10 Data operation WZ YX XX XX register 0 register 1 register 2 register 3 WZ YX WZ YX WZ YX YY YY ZZ ZZ WW WW Array of structure ( vertex shader) Structure of array ( pixel shader and media code) vector
11 Instruction sample (f0) add.sat(16) g28.0 ub g3.0 f g10.0 w {align1} execute size type register number subregister number VertStride HorizStride WidthAccess mode prediction register
12 Instruction set Normal SIMD instructions –add, mul, avg, mov etc –dp3, dp4 etc Branch control instructions –If,else, do, while, jmpi etc –branch is needed in media code Send instructions –communicate with shared function units –media kernel use it to control thread life cycle, read and write into surface
13 Instruction example add.sat(16) g28.0 UB g3.0 f g10.0 W {align1} XXXXXXXXXXXXXXXXYYYYYYYY YYYYYYYY ++++ ZZZZZZZZZZZZZZZZ g28 g3 g4 g10
14 An example Input and output payload register passed from inline data, x, y, mv, field flags etc input Y0-Y3 input U input V reference Y reference U reference V tmp registers Result registers, organized in YUV420 format Indirect data payload media read from reference surface media write to destination surface constant data
15 Planar data vs Packed data Easy to handle by media kernel Hard to apply some filters Can not be directly used as a sampler source in hardware implementation
16 Work flow B DCT Data I kernel PP forward reference frame backward reference frame kernel IP Indirect data inline data Media read message Media write message Destination surface slice of macroblocks
17 About XvMC API Post processing missing in XvMC API design Video output mixer.
18 High Level Language Why a high level language for media kernel is preferred ? –Easy to debug –Easy to reuse code –Hide platform details, easy to understand and maintain Possible choice –GLSL is not OK – Simple C extension ?
19 H.264 Kernels became much more complex because of difference MC and DCT size combination. Not suitable on slice level API, because of intra prediction. Need schedule and dependency control ability for media threads because of intra prediction
20 VAAPI picture level API cover mpeg2 h264 vc1 from different entry points post processing and video output mixer is missing
21 TODO IDCT code optimize Mpeg2 XVMC VLD extension VAAPI for mpeg2 VAAPI for AVC Video post processing and mixer
22 Q&A Thank You!