Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fully Scalable Multiview Wavelet Video Coding

Similar presentations


Presentation on theme: "Fully Scalable Multiview Wavelet Video Coding"— Presentation transcript:

1 Fully Scalable Multiview Wavelet Video Coding
Yu Liu and King Ngi Ngan Department of Electronic Engineering The Chinese University of Hong Kong ISCAS2009, Taipei, Taiwan, May 24-27, 2009 Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

2 Outline Introduction Proposed Method Experimental Results Conclusion
Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

3 Introduction Multiview Video
Proposed Method Experimental Results Conclusion Introduction Related Work Introduction Multiview Video Captured simultaneously by several cameras from different viewpoints Straightforward solution: simulcast coding for each view H.264/AVC [1] or wavelet-based VIDWAV [2] Inter-view statistical dependencies Multiview video coding (MVC) Full scalability in multiview video coding Quality, spatial, temporal and view dimensions Scalable coding: H.264/AVC-based JSVM [3] and wavelet-based SVC [2] Can achieve part of the requirement, but view scalability is missing H.264/AVC-based JMVM [4] Some degree of temporal and view scalability But spatial or quality scalability is not supported Multiview video captured simultaneously by several cameras from different viewpoints, enables a wide variety of future multimedia applications. However, it also results in huge amounts of data to be stored or transmitted. The straightforward solution for this would be to encode each view independently using state-of-the-art video codec. However, multiview video contains a large amount of inter-view statistical dependencies, which should be exploited by multiview video coding (MVC). On the other hand, one of important multiview video coding research topic is to achieve full scalability in all four dimensions: quality, spatial, temporal and view dimensions. Most scalable video codecs can achieve part of the requirement, but view scalability is missing. Although the JMVM offers some degree of temporal and view scalability, spatial or quality scalability is not supported at all. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

4 Related Work Schemes for wavelet-based MVC
Introduction Proposed Method Experimental Results Conclusion Introduction Related Work Related Work Schemes for wavelet-based MVC Inherent scalability due to subband/wavelet decomposition 2D wavelet transform in spatial domain Motion compensated temporal filtering (MCTF) in temporal domain Disparity compensated view filtering (DCVF) in view domain But no view scalability [5] or limited view scalability [6] is supported Due to global MCTF/DCVF selection on Group of GOP basis An attempt to provide full view scalability by Garbas et al. [7] But the coding efficiency is not satisfying Schemes for wavelet-based MVC have been devised. They all have a common principle that the correlation of multiview video is exploited by MCTF in temporal domain, DCVF in view domain and 2D DWT in spatial domain. The inherent scalability of such wavelet decomposition is appealing. Unfortunately no view scalability or limited scalability is supported for those schemes, due to global MCTF/DCVF selection on group of GOP basis. An attempt to provide full view scalability is made by Garbas et al., but the coding efficiency is not satisfying. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

5 Related Work Wavelet-based MVC Frameworks
Introduction Proposed Method Experimental Results Conclusion Introduction Related Work Related Work Wavelet-based MVC Frameworks Different wavelet decomposition structures along temporal and view axes (a) Simulcast scheme[2] Only MCTF, but no DCVF (b) Regular scheme[5] Multilevel MCTF for the temporal subbands Only DCVF in the lowest temporal subbands (c) Adaptive scheme[5,6] MCTF and DCVF interleaved global MCTF/DCVF selection on Group of GOP basis Wavelet-based video coding provides an elegant and flexible way for the MVC framework by using high-dimensional wavelet transform. Figure (a) illustrates the simulcast-based wavelet decomposition structure along the temporal and view axis. There is only MCTF in temporal axis, but no DCVF in view axis. The regular scheme in Figure (b) is based on the assumption that the temporal correlation is always stronger than the inter-view one. Thus the DCVF is only performed in the lowest temporal subbands. However, the above assumption is not always true. A reasonable way to achieve better coding efficiency is to derive the optimal decomposition structure based the coding cost. Therefore, an adaptive scheme is further proposed. Instead of fixing the order of MCTF and DCVF, the MCTF and DCVF are interleaved based on the global correlation analysis on GoGOP basis. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

6 Locally Adaptive Inter-View-Temporal Structure
Introduction Proposed Method Experimental Results Conclusion Locally Adaptive Inter-View-Temporal Structure 2-D Pipeline-Based Lifting Scheme Locally Adaptive Inter-View-Temporal Structure Analysis of temporal and inter-view correlation [8] Temporal prediction mode is dominant for all sequences But sometimes inter-view prediction is more efficient than temporal prediction for a number of blocks According to the analysis of temporal and inter-view correlation on the macroblock level, the temporal prediction mode is dominant for all sequences, but sometimes inter-view prediction is more efficient than temporal prediction for a number of blocks. As verified by investigating the statistical dependencies between the temporal and inter-view pictures, considering a local correlation analysis model could further improve the coding efficiency. (a) Prediction modes with first order inter-view and temporal neighbor pictures, (b) Probabilities of prediction modes (T: temporal mode, V: inter-view mode) [8] on the macroblock level Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

7 Locally Adaptive Inter-View-Temporal Structure
Introduction Proposed Method Experimental Results Conclusion Locally Adaptive Inter-View-Temporal Structure 2-D Pipeline-Based Lifting Scheme Locally Adaptive Inter-View-Temporal Structure Locally adaptive inter-view-temporal decomposition Adopts the locally adaptive inter-view-temporal correlation analysis based on the macroblock level Instead of the global temporal and inter-view correlation analysis based on the whole picture Problem: how to implement the lifting steps of the MCTF and DCVF on the macroblock level within the same framework Solution: the weighting lifting [9] is employed for MCTF or DCVF Instead of the global temporal and inter-view correlation analysis based on the whole picture, our proposed wavelet decomposition structure adopts the locally adaptive inter-view-temporal correlation analysis based on the macroblock level. Now the problem is how to implement the lifting steps of the MCTF and DCVF on the macroblock level within the same framework. The solution is to employ the weighting lifting scheme. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

8 Locally Adaptive Inter-View-Temporal Structure
Introduction Proposed Method Experimental Results Conclusion Locally Adaptive Inter-View-Temporal Structure 2-D Pipeline-Based Lifting Scheme Locally Adaptive Inter-View-Temporal Structure Locally adaptive inter-view-temporal wavelet decomposition structure The number of levels of inter-view transform at each temporal subband is related to that of temporal transform, but not in excess of the maximum predetermined number of levels of inter-view transform. This figure shows an illustration of the proposed locally adaptive inter-view-temporal wavelet decomposition structure. The major difference between the proposed structure and conventional structure is that our inter-view transform is totally incorporated with temporal transform on the macroblock level within the same framework. The number of levels of inter-view transform at each temporal subband is related to that of temporal transform, but not in excess of the maximum predetermined number of levels of inter-view transform. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

9 2-D Pipeline-Based Lifting Scheme
Introduction Proposed Method Experimental Results Conclusion Locally Adaptive Inter-View-Temporal Structure 2-D Pipeline-Based Lifting Scheme 2-D Pipeline-Based Lifting Scheme Selection of optimal decomposition structure Adaptive wavelet decomposition scheme on the basis of GoGOP (Group of GOP) Demands the significant memory requirement and computational complexity Suffers from the problem of temporal boundary effects across GOP And thus its coding performance depends on the temporal GOP size In order to solve those problems, a new 2-D pipeline-based lifting scheme is proposed An extension of 1-D pipeline-based lifting scheme [10] on the basis of macroblock level It does not physically break the multiview sequence into GoGOP but processes it without intermission At most 4x4 frames are involved for one-level transform The adaptive wavelet decomposition scheme is based on the hierarchical wavelet structure and the selection of optimal decomposition structure is performed on the basis of GoGOP. This significantly increases the memory requirement and computational complexity. In addition, it suffers from the problem of temporal boundary effects across GOP and thus its coding performance depends on the temporal GOP size. In order to solve those problems, a new 2-D pipeline processing-based lifting scheme on the basis of macroblock level is proposed, which is an extension of 1-D pipeline processing-based lifting scheme. This 2-D lifting scheme does not physically break the multiview sequence into GoGOP but processes it without intermission. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

10 2-D Pipeline-Based Lifting Scheme
Introduction Proposed Method Experimental Results Conclusion Locally Adaptive Inter-View-Temporal Structure 2-D Pipeline-Based Lifting Scheme 2-D Pipeline-Based Lifting Scheme Pipeline processing-based lifting schemes 2-D case of (c) forward and (d) inverse inter-view-temporal transform 1-D case of forward and Inverse temporal transform Figures (a) and (b) show the pipeline-based lifting scheme for 1-D case of forward and inverse temporal transform. Here, we extend the scheme from 1-D temporal transform to 2-D inter-view-temporal transform, as shown in Figures (c) and (d). The 2-D scheme uses at most 4x4 frames for buffering. The video frames at the temporal axis are first pushed into the 4x4 buffering window sequentially, followed by those at the view axis. In other words, the 2-D pipeline-based lifting scheme consists of two separable 1-D pipeline processes along the temporal or view axis. Therefore, the multiview sequence can be processed continuously without being broken into the GoGOP. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

11 Introduction Proposed Method Experimental Results Conclusion Experimental Results Experimental Results Two multiview QVGA (320x240) sequences: (a) Ballroom and (b) Race1 Eight views with frame rate of 25 Hz or 30 Hz, captured by a parallel camera array with 20 cm spacing First 128 frames of each sequence are encoded to generate only one bitstream for one multiview video (including eight views) PSNR is averaged over all views and bit rate is given per view Different temporal/inter-view wavelet decomposition schemes are compared: Simulcast scheme Regular scheme Adaptive scheme Proposed scheme In the experiments, two multiview QVGA video sequences are used. Different temporal/inter-view wavelet decomposition schemes are compared and are all built upon the wavelet-based SVC. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

12 Introduction Proposed Method Experimental Results Conclusion Experimental Results Experimental Results The performance comparison of different wavelet decomposition schemes for tested multiview sequences: Ballroom and Race1 This Figure shows the R-D curves with respect to the coding results on the two multiview video with eight views. We can observe that the proposed scheme always achieves the best coding performance amongst the tested schemes. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

13 Introduction Proposed Method Experimental Results Conclusion Experimental Results Experimental Results The performance comparison between the proposed scheme and simulcast scheme with different quality, temporal and view scalability for tested multiview sequences: Ballroom and Race1 This Figure presents the performance comparison between the proposed scheme and the simulcast scheme for the two multiview videos at different bit rates, frame rates and view rates. As seen from the R-D curves at different test points, the proposed scheme outperforms consistently the simulcast scheme, even for the two-view case. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

14 Experimental Results Coding Performance Comparison
Introduction Proposed Method Experimental Results Conclusion Experimental Results Experimental Results Coding Performance Comparison The PSNR gain can be up to 1.18 dB and 0.88 db over the regular and adaptive schemes at low bit rates Up to 2.49 dB coding gain is observed at low bit rates, compared with the simulcast scheme Notes: We do not demonstrate the coding results for spatial scalability. The reason is that all of these schemes naturally inherit this scalability from wavelet-based SVC. For the coding gain comparison, the PSNR gain can be up to 1.18 dB and 0.88 db over the regular and adaptive schemes at low bit rates. Up to 2.49 dB coding gain is observed at low bit rates, compared with the simulcast scheme. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

15 Experimental Results Memory and Complexity Comparison Simulcast scheme
Introduction Proposed Method Experimental Results Conclusion Experimental Results Experimental Results Memory and Complexity Comparison Simulcast scheme the least requirement of memory and computation, due to the only temporal correlation exploitation Regular scheme Slightly increase the requirement of memory and computation than the simulcast, due to only exploiting the inter-view correlation at the lowest temporal subbands Adaptive scheme Significantly increase the memory and computational complexity, due to the selection on the basis of GoGOP Proposed scheme Moderate memory and complexity, due to the 2-D pipeline-based inter-view-temporal lifting on macroblock basis For memory and complexity comparison, it is certain that the simulcast scheme has always the least requirement of memory and computation, due to the only temporal correlation exploitation. The regular scheme slightly increase the requirement of memory and computation than the simulcast, due to only exploiting the inter-view correlation at the lowest temporal subbands. The adaptive scheme significantly increase the memory and computational complexity, due to the selection on the basis of GoGOP. Our proposed scheme has moderate memory and complexity, due to the 2-D pipeline-based inter-view-temporal lifting on macroblock basis. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

16 Introduction Proposed Method Experimental Results Conclusion Conclusion Conclusion Inter-View-Temporal Lifting-based Wavelet Coding Technique for Fully Scalable Multiview Video Coding An important scalability feature view scalability, besides other three scalabilities 2-D pipeline-based lifting scheme to remove both of temporal and view boundary effects Local adaptive inter-view-temporal structure to exploit local inter-view-temporal correlation on macroblock level In this presentation, an inter-view-temporal lifting-based wavelet coding technique is proposed for fully scalable multiview video coding, which provides the following advantages: Besides other three scalabilities, view scalability is also supported. 2-D pipeline-based lifting scheme is implemented to remove both of temporal and view boundary effects Local adaptive inter-view-temporal structure is employed to exploit local inter-view-temporal correlation on macroblock level. Y. Liu and K.N. Ngan Fully Scalable Multiview Wavelet Video Coding

17 Thank You ! Q&A Y. Liu and K.N. Ngan
Fully Scalable Multiview Wavelet Video Coding


Download ppt "Fully Scalable Multiview Wavelet Video Coding"

Similar presentations


Ads by Google