Presentation is loading. Please wait.

Presentation is loading. Please wait.

HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.

Similar presentations


Presentation on theme: "HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center."— Presentation transcript:

1 HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center

2 2 3D EU Kernel Media Engine URBURB Media (Video Front End) Command Streamer Thread Spawner Thread Dispatcher Indirect data Thread payload Video memory Data portSampler

3 3 Mode of operation Coded data Output pixel MC IDCTVLDISIQ VFE or host EU Kernels

4 4 Current XVMC implementation coded data Output pixel MC IDCT VLDIS IQ Host Software per slice data per macroblock data EU Kernels

5 5 XVMC XVMC lib Media Application DRI interface X Server Graphic Hardware render, sync, resource management mpeg stream decode slice of macro blocks media commands, video memory management

6 6 Video Memory Layout command stream VFE state Interface descriptors media surface EU kernel Instruction media object command selected interface media pointer command media surface surface state binding tables flush command

7 7 Execute Unit introduction  SIMD code (variable execute size up to 16) with prediction and control mask.  Float and integer data type  Region based direct and indirect register addressing  Support scalar and immediate source operand

8 8 EU Registers  GRF (General Register File) –256 bits per register (g0, g1, g2, gxx)  MRF (Message Register File) –256 bits per register (m0, m1, m2, mx), write only, –Used to pass payload from thread to shared function unit.  ARF (Architecture Register File) –e.g null, ip and flag register  Immediate –encoded in instruction

9 9 Register Region 65012347 14138910111215 g0 (256 bits) Width=8 VertStride=16 HorzStride=2 Type=w g5.2 w 1234567890101112131415 g15.3 UB origin regnum=5, subregnum=2 Regnum.Subregnum Type 012 1 2 0

10 10 Data operation WZ YX XX XX register 0 register 1 register 2 register 3 WZ YX WZ YX WZ YX YY YY ZZ ZZ WW WW Array of structure ( vertex shader) Structure of array ( pixel shader and media code) vector

11 11 Instruction sample (f0) add.sat(16) g28.0 ub g3.0 f g10.0 w {align1} execute size type register number subregister number VertStride HorizStride WidthAccess mode prediction register

12 12 Instruction set  Normal SIMD instructions –add, mul, avg, mov etc –dp3, dp4 etc  Branch control instructions –If,else, do, while, jmpi etc –branch is needed in media code  Send instructions –communicate with shared function units –media kernel use it to control thread life cycle, read and write into surface

13 13 Instruction example add.sat(16) g28.0 UB g3.0 f g10.0 W {align1} XXXXXXXXXXXXXXXXYYYYYYYY ++++++++++++ YYYYYYYY ++++ ZZZZZZZZZZZZZZZZ g28 g3 g4 g10

14 14 An example Input and output payload register passed from inline data, x, y, mv, field flags etc input Y0-Y3 input U input V reference Y reference U reference V tmp registers Result registers, organized in YUV420 format Indirect data payload media read from reference surface media write to destination surface constant data

15 15 Planar data vs Packed data  Easy to handle by media kernel  Hard to apply some filters  Can not be directly used as a sampler source in hardware implementation

16 16 Work flow B DCT Data I kernel PP forward reference frame backward reference frame kernel IP Indirect data inline data Media read message Media write message Destination surface slice of macroblocks

17 17 About XvMC API  Post processing missing in XvMC API design  Video output mixer.

18 18 High Level Language  Why a high level language for media kernel is preferred ? –Easy to debug –Easy to reuse code –Hide platform details, easy to understand and maintain  Possible choice –GLSL is not OK – Simple C extension ?

19 19 H.264  Kernels became much more complex because of difference MC and DCT size combination.  Not suitable on slice level API, because of intra prediction.  Need schedule and dependency control ability for media threads because of intra prediction

20 20 VAAPI  picture level API  cover mpeg2 h264 vc1 from different entry points  post processing and video output mixer is missing

21 21 TODO  IDCT code optimize  Mpeg2 XVMC VLD extension  VAAPI for mpeg2  VAAPI for AVC  Video post processing and mixer

22 22 Q&A Thank You!


Download ppt "HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center."

Similar presentations


Ads by Google