Download presentation
Presentation is loading. Please wait.
Published byDakota Mannion Modified over 9 years ago
1
NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board 陳奕安 2008.06.11 1
2
NCTU, EE, Vision Lab Outline System Architecture Multithreading of this system Reference framework 5 Parallelism of H.264 Memory issue 2
3
NCTU, EE, Vision Lab System Architecture PC 2MEX Board 2 MEX Board 1 Capture Frame H.264 Encode Send to Network Display H.264 Decode Receive from Network PC 1 PC 2 3
4
NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 4 Camera Computer
5
NCTU, EE, Vision Lab PC MEX Host/ MEX Communication DSP started : fill memory Initialize transfer DSP to PCI transfer request Start Transfer Transfer finished Set DSP FIFO Direction Set FIFO Full Flag value DSP FIFO is reset Start EDMA Unreset DSP1 FIFO Clear PCI Interrupt PCI started : wait for interrupt Initialize transfer PCI to DSP start transfer request Wait for transfer finished Transfer finished Set transfer size Set PCI FIFO direction Select DSP data sources Set transfer destination address Start PCI FIFO Clear DSP Interrupt 5 Data transfer from the 4 DSP (SDRAM) to PCI [7]
6
NCTU, EE, Vision Lab Host/ MEX Communication 6 Data Image
7
NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 7 Camera Computer
8
NCTU, EE, Vision Lab Networking of H.264 Video Application Video Coding Layer Network Abstraction Layer Bitstream Adoption Packet Adoption Reconstructed picture VCL Data Parameter Sets NAL-unit H.320 System MPEG-2 System AVC Storage RTP Payload Supplemental Enhancement Information AVC / H.264 Transport H.264 VCL and NAL[6] H.264 High Level Architecture
9
NCTU, EE, Vision Lab Transport layer Session layer Networking of H.264 Video MAC header IP header UDP header RDP header Video Packet IP header UDP header RTP header Video Packet UDP header RTP header Video Packet RTP header Video Packet Video Packet Application layer Network layer Data link layer Physical layer NAL-Unit of H.264 TMS320C600 Network Developer’s Kit Video Packetization
10
NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 10 Camera Computer
11
NCTU, EE, Vision Lab Input buffers Output buffers I/O buffer management 11 InputingHead Inputing Tail Head Inputing Tail Head Outputing Tail Head Tail HeadTail Outputing
12
NCTU, EE, Vision Lab Input / output buffers I/O buffer management 12 Tail Head Inputing Tail Head Outputing Inputing Tail Head Outputing Head Inputing Tail Head Outputing Tail Head Inputing Tail Head Outputing Tail Head Inputing Tail Head Outputing Tail Head
13
NCTU, EE, Vision Lab System Architecture Multithreading of this system Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 13 Camera Computer
14
NCTU, EE, Vision Lab Reference framework for DSP Reference framework 5 DSP/BIOS, TMS320 DSP Algorithm Standard Processing flow of RF5 14 SplitJoint F0F0 F1F1 F2F2 V0V0 V1V1 V2V2 14 cell channel task Fi, Vi XDAIS algorithm
15
NCTU, EE, Vision Lab Reference framework for DSP Data communication of RF5 SIO : Task & Device SCOM : Task & Task 15 device driver task SIO object data buffer data pointer writer task reader task task SCOM message data buffer data pointer SCOM queue
16
NCTU, EE, Vision Lab Data communication of RF5 ICC : Cell& Cell Reference framework for DSP 16 1 2 in outin out 3 in out data buffer data pointer cell ICC object describing a buffer element in an a list of pointers to ICC objects
17
NCTU, EE, Vision Lab Application Control of RF5 Task Receiving both SCOM messages and control messages Reference framework for DSP 17 task SCOM queue for data messages SCOM message MBX mailbox for control messages
18
NCTU, EE, Vision Lab The present system System Architecture Input task H.264 Encode Processing task TX networking task 18 Frame i Frame i+1 Slice NAL Control task Rx
19
NCTU, EE, Vision Lab Multithreading of this system System Architecture Input task H.264 Encode Processing task TX networking task 19 Frame i Frame i+1 MB NAL Control task Rx
20
NCTU, EE, Vision Lab Parallelizing H.264 Task-level Decomposition Divide the algorithm into balance tasks Accelerate each task Data-level Decomposition GOP-level Parallelism Frame-level Parallelism Slice-level Parallelism Macroblock-level Parallelism 20
21
NCTU, EE, Vision Lab H.264 Encoder Block Diagram 21 F n (Current) TQReorder Entropy encode ME F’ n-1 (reference) MC Choose Intra prediction Intra prediction F’ n (reconstructed) T -1 Q -1 Filter + - Dn P Inter Intra + - D’n uF’n X NAL
22
NCTU, EE, Vision Lab H.264 Decoder Block Diagram 22 Reorder Entropy decode F’ n-1 (reference) MC Intra prediction F’ n (reconstructed) T -1 Q -1 Filter P Inter Intra + D’n uF’n - NAL
23
NCTU, EE, Vision Lab Task-level Decomposition Task profile for H.264 23 [2]
24
NCTU, EE, Vision Lab H.264 data structure Parallelizing H.264 GOP0GOP1GOP2…GOPn F0F1F2Fn …. Slice 0 Slice 1 Slice 2 …. Slice 3 Video Sequence Group of picture MB0MB1 Frame Slice MB2…MBn Y Cb Cr Macroblock 24
25
NCTU, EE, Vision Lab Data-level Decomposition GOP-level Parallelism High latency, large memory Frame-level Parallelism I, P, B frame imbalance Slice-level Parallelism Bitrates increase Macroblock-level Parallelism 25
26
NCTU, EE, Vision Lab Macroblock-level Parallelism Spatial parallelism Temporal parallelism Spatial & temporal parallelism Possible data dependencies for macroblock 26 Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Current MB frame i + 1 frame i search window
27
NCTU, EE, Vision Lab Macroblock-level Parallelism Spatial parallelism 27 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T3 MB(1,1) T4 MB(2,1) T5 MB(3,1) T6 MB(4,1) T7 MB(0,2) T5 MB(1,2) T6 MB(2,2) T7 MB(3,2) T8 MB(4,2) T9 MB(0,3) T7 MB(1,3) T8 MB(2,3) T9 MB(3,3) T10 MB(4,3) T11 MB(0,4) T9 MB(1,4) T10 MB(2,4) T11 MB(3,4) T12 MB(4,4) T13 MBs processed MBs processing MBs to be process
28
NCTU, EE, Vision Lab Macroblock-level Parallelism Temporal parallelism 28 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T6 MB(1,1) T7 MB(2,1) T8 MB(3,1) T9 MB(4,1) T10 MB(0,2) T11 MB(1,2) T12 MB(2,2) T13 MB(3,2) T14 MB(4,2) T15 MB(0,3) T16 MB(1,3) T17 MB(2,3) T18 MB(3,3) T19 MB(4,3) T20 MB(0,4) T21 MB(1,4) T22 MB(2,4) T23 MB(3,4) T24 MB(4,4) T25 MB(0,0) T1 MB(1,0) T2 MB(2,0) T13 MB(3,0) T14 MB(4,0) T15 MB(0,1) T16 MB(1,1) T17 MB(2,1) T18 MB(3,1) T19 MB(4,1) T20 MB(0,2) T21 MB(1,2) T22 MB(2,2) T23 MB(3,2) T24 MB(4,2) T25 MB(0,3) T26 MB(1,3) T27 MB(2,3) T28 MB(3,3) T29 MB(4,3) T30 MB(0,4) T31 MB(1,4) T32 MB(2,4) T33 MB(3,4) T34 MB(4,4) T35 frame i + 1 frame i MBs processed MBs processingMBs to be process
29
NCTU, EE, Vision Lab Macroblock-level Parallelism Spatial & temporal parallelism 29 MB(0,0) T5 MB(1,0) T6 MB(2,0) T7 MB(3,0) T8 MB(4,0) T9 MB(0,1) T7 MB(1,1) T8 MB(2,1) T9 MB(3,1) T10 MB(4,1) T11 MB(0,2) T9 MB(1,2) T10 MB(2,2) T11 MB(3,2) T12 MB(4,2) T13 MB(0,3) T11 MB(1,3) T12 MB(2,3) T13 MB(3,3) T14 MB(4,3) T15 MB(0,4) T13 MB(1,4) T14 MB(2,4) T15 MB(3,4) T16 MB(4,4) T17 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T3 MB(1,1) T4 MB(2,1) T5 MB(3,1) T6 MB(4,1) T7 MB(0,2) T5 MB(1,2) T6 MB(2,2) T7 MB(3,2) T8 MB(4,2) T9 MB(0,3) T7 MB(1,3) T8 MB(2,3) T9 MB(3,3) T10 MB(4,3) T11 MB(0,4) T9 MB(1,4) T10 MB(2,4) T11 MB(3,4) T12 MB(4,4) T13 frame i + 1 frame i
30
NCTU, EE, Vision Lab Multithreading of this system System Architecture Input task H.264 Encode Processing task TX networking task 30 Frame i Frame i+1 MB NAL Control task Rx
31
NCTU, EE, Vision Lab Memory Issue 31 L1P Cache Direct Mapped 16Kbytes Total DM642 DSP Core L1D Cache 2-way Set Associated 16Kbytes Total L2 Cache/ Memory 256Kbytes Total Two-level cache architecture of DM642 EDMA Controller peripherals Limited memory of DM642 Use memory buffer to reduce memory access
32
NCTU, EE, Vision Lab Memory Issue Memory hierarchy for inter prediction 32 Memory hierarchy [4]
33
NCTU, EE, Vision Lab Memory Issue Slice memory buffer for intra prediction and deblocking filter Slice Memory [5] 33
34
NCTU, EE, Vision Lab Reference [1] Texas Instruments, Incorporated “Reference Frameworks for eXpressDSP Software: RF5, An Extensive, High-Density System.” (spru795a) [2] TC Chen, HC Fang, CJ Lian, CH Tsai “Algorithm analysis and architecture design for HDTV applications - a look at the H.264/AVC video compressor system “IEEE CIRCUITS & DEVICES MAGAZINE MAY/JUNE 2006 [3] Cor Meenderinck, Arnaldo Azevedo and Ben Juurlink “Parallel Scalability of Video Decoders” April 29, 2008. [4] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder”, IEEE Trans. CSVT, Vol. 15, No. 5, pp. 609-619, May 2005. [5] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications,” ISSCC Digest of Technical Papers, pp. 402-403, Feb. 2006. [6] T. Wiegand et al., “Overview of H.264/AVC Video Coding Standard”, IEEE Trans. on Circ. and Sys. For Video Technology, Vol. 13, No. 7, pp. 560–576, July 2003.1 [7] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”. 34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.