User-Oriented Approach in Spatial and Temporal Domain Video Coding

User-Oriented Approach in Spatial and Temporal Domain Video Coding
2003/12/18 Chia-Chiang Ho, Wei-Ta Chu, Chen-Hsiu Huang and Ja-Ling Wu Communication and Multimedia Laboratory Department of Computer Science and Information Engineering National Taiwan University

Introduction Video Encoding Challenges: Reducing storage or transmission bandwidth, while preserving mostly the perceived quality. Typical video encoding schemes treat different parts of the source video as equal importance. By combining user attention and foveation techniques, we develop both scalable and non-scalable coding schemes that preserves qualities as far as possible. One of the most challenging problems in video encoding is to reduce the storage and transmission bandwidth required by compressed bitstream, and preserving its perceived quality. In that, various human visual system (HVS) based approaches have been proposed to fight such challenge. In our work, we adopt two modeling systems, the user attention model and foveation model to develop a scalable coding scheme based on MPEG-4 FGS standard. The following presentation could be roughly divided into two parts: first we discuss the adopted modeling systems, and then we illustrate how the modeling systems be cooperated with the video coding schemes.

User Attention Model Attention refers to the ability of one human to focus and concentrate on some visual or auditory ‘object’. Attention can be modeled by two directions: bottom-up and top-down. Bottom-up attention models what people are attracted to see. Top-down attention was usually modeled by detecting some meaningful objects or features. (models what people are willing to see)

Foveation Model We know that the retina is responsible for detecting the light. There are two kinds of neurons : rods and cones. And cones are responsible for daylight vision. The density of cone cells is higher at the fovea and drops with increasing eccentricity (the viewing angle).

Foveation Function According to empirical experiments:
Larger distance, larger regions can be foveated Larger contrast threshold, larger regions can be foveated Foveation model is defined as a function of viewing distance (D) and pixel contrast. Fovea Retina Lens D Foveation point e Foveated region

Foveation in Brief The foveation model can be regarded as a kind of region-of-interest concept. For ROI description, object segmentation techniques are widely applied. However, satisfactory results are not easy to be obtained. Foveation model implicitly alleviates the object boundary restriction, and we think it may be a compromising mechanism for object-based applications.

User-Oriented Video Coding
Based on MPEG-4 FGS, foveation is exploited to perform spatially selective enhancement, and user attention model is used to facilitate temporal scalability.

Spatial Domain Approach
The proposed architecture for the user-oriented video encoding First, the input raw video goes through the focus detection module, which is built based on the prescribed attention model, to find out focus points According to the information of focus points, the raw video goes through the foveation filter. All 8x8 blocks in an input frame will be filtered by DCT, foveation filter, and finally IDCT modules to get the foveated frames. The foveated frames are then sent to the video encoder for normal video encoding.

Proposed coding schemes
Non-scalable Coding: With foveation model, encoders can discard unimportant visual information as much as possible. Thus, the compression gain can be increased without sacrificing perceived quality. Scalable Coding: Encoders can selectively preserve higher quality for focused regions.

Scalable Coding Foveation model based scalable coding on the base layer The difference between the original video and the foveated video is then compressed as enhancement bitstream(s). The difference between the original video and the foveated video is then compressed as enhancement bitstream. When more bandwidth is available, the streaming server can improve video quality by selectively adding enhancement layers according to the foveation model. The extra bitstream will be added only to some regions, rather than uniformly enhancing the whole frame.

User Attention based Temporal Coding
According to user attention model, the saliency value of a video segment is obtained from intensity, color, motion, and face features. The segments with small saliency variations should be preserved when transmission bandwidth is not enough.

Temporal Domain Approach
In our work, the saliency values of each video frame are calculated from different features. We could construct a saliency curve to illustrate the saliency variation of a video clip. Here is an example from a news video clip. The segments with high saliency variations are considered to be encoded as enhancement layer. Here we denote Pi as the value of the i-th pixel, and Score denote the saliency score of this frame. For each frame, an integrated score is calculated as follows:

Temporal Reduction Steps
Quantization: quantize the saliency curve to several stages mainly according to its standard deviation. Variance Calculation: variance of the frames within window is calculated to form the basis of saliency. Scalable Coding: If the variance of video shot is larger than a pre-defined threshold, we say that it dazzled users and doesn’t possess high semantic meaning. This video segment is then encoded in the enhancement layer due to storage or transmission restriction. To meet the bandwidth limit for base layer, we design a window-based approach to adaptively skip inconspicuous fames by the following steps: This kind of video segments are first encoded as enhancement layer and enhance the whole video if more bandwidth is available.

Experimental Results: Non-scalable Spatial Coding
Original D = 1, k = 2 D = 1, k = 6 We increase the minimum contrast threshold by modifying CT0 as: CT1(k)=CT0+kS And the D is the viewing distance. D = 6, k = 2 D = 6, k = 6

Non-Scalable Experimental Results
Bitrate savings of applying foveation filters to various MPEG-1 encoded sequences Sequence Original bitstream size(bytes) bitrate (kbps) Foveated size (bytes) Bitrate Saving Ratio (%) foreman 831 769 7.4 mobile 2445 2078 15.0 butterfly 392447 721 374541 688 4.5 About 9% bitrate saving in average

Experimental Results: Scalable Temporal Coding
In our preliminary experiments, we found that this approach provides satisfactory results in some categories of videos. For example, in a news video, the segments with smooth frames, such as the scenes of anchorperson and close-up shot are preserved to be the base layer. Other segments with frequent scene changes are encoded as enhancement layer.

Conclusion We proposed a user-oriented approach combining user attention and foveation models to facilitate scalable coding in spatial and temporal domains. This framework could be extended to develop a transcoder that selectively transcodes a part of a video frame to meet different requirements in different devices.

User-Oriented Approach in Spatial and Temporal Domain Video Coding

Similar presentations

Presentation on theme: "User-Oriented Approach in Spatial and Temporal Domain Video Coding"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

User-Oriented Approach in Spatial and Temporal Domain Video Coding

Similar presentations

Presentation on theme: "User-Oriented Approach in Spatial and Temporal Domain Video Coding"— Presentation transcript:

Similar presentations

About project

Feedback