User-Oriented Approach in Spatial and Temporal Domain Video Coding

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
Scalable ROI Algorithm for H.264/SVC-Based Video Streaming Jung-Hwan Lee and Chuck Yoo, Member, IEEE.
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
Presented by Yehuda Dar Advanced Topics in Computer Vision ( )Winter
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
A Novel Method for Generation of Motion Saliency Yang Xia, Ruimin Hu, Zhenkun Huang, and Yin Su ICIP 2010.
ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.
Young Deok Chun, Nam Chul Kim, Member, IEEE, and Ick Hoon Jang, Member, IEEE IEEE TRANSACTIONS ON MULTIMEDIA,OCTOBER 2008.
Natan Jacobson, Yen-Lin Lee, Vijay Mahadevan, Nuno Vasconcelos, Truong Q. Nguyen IEEE, ICME 2010.
The MPEG-4 Fine-Grained Scalable Video Coding Method for Multimedia Streaming Over IP Hayder Radha,Mihaela van der Schaar and Yingwei Chen IEEE TRANSACTIONS.
Fine Grained Scalable Video Coding For Streaming Multimedia Communications Zahid Ali 2 April 2006.
Video Coding with Linear Compensation (VCLC) Arif Mahmood, Zartash Afzal Uzmi, Sohaib A Khan Department of Computer.
Video Transmission Adopting Scalable Video Coding over Time- varying Networks Chun-Su Park, Nam-Hyeong Kim, Sang-Hee Park, Goo-Rak Kwon, and Sung-Jea Ko,
ADVISE: Advanced Digital Video Information Segmentation Engine
Overview of Fine Granularity Scalability in MPEG-4 Video Standard Weiping Li, Fellow, IEEE.
Motion-compensation Fine-Granular-Scalability (MC-FGS) for wireless multimedia M. van der Schaar, H. Radha Proceedings of IEEE Symposium on Multimedia.
Efficient Fine Granularity Scalability Using Adaptive Leaky Factor Yunlong Gao and Lap-Pui Chau, Senior Member, IEEE IEEE TRANSACTIONS ON BROADCASTING,
Seamless Switching of Scalable Video Bitstreams for Efficient Streaming Xiaoyan Sun, Feng Wu, Shipeng Li, Wen, Gao, and Ya-Qin Zhang.
Scalable Rate Control for MPEG-4 Video Hung-Ju Lee, Member, IEEE, Tihao Chiang, Senior Member, IEEE, and Ya-Qin Zhang, Fellow, IEEE IEEE TRANSACTIONS ON.
1 Motivation Video Communication over Heterogeneous Networks –Diverse client devices –Various network connection bandwidths Limitations of Scalable Video.
Video Coding. Introduction Video Coding The objective of video coding is to compress moving images. The MPEG (Moving Picture Experts Group) and H.26X.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
A Generic Virtual Content Insertion System Based on Visual Attention Analysis H. Liu 1, 2, S. Jiang 1, Q. Huang 1, 2, C. Xu 2, 3 1 Institute of Computing.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
報告人:張景舜 P.H. Wu, C.C. Chen, J.J. Ding, C.Y. Hsu, and Y.W. Huang IEEE Transactions on Image Processing, Vol. 22, No. 9, September 2013 Salient Region Detection.
Methods of Video Object Segmentation in Compressed Domain Cheng Quan Jia.
Computer Vision – Overview Hanyang University Jong-Il Park.
1 Lecture 1 1 Image Processing Eng. Ahmed H. Abo absa
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Image/Video Coding Techniques for IPTV Applications Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal.
Detection of Image Alterations Using Semi-fragile Watermarks
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
MPEG-4 Systems Introduction & Elementary Stream Management
Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015
MPEG4 Fine Grained Scalable Multi-Resolution Layered Video Encoding Authors from: University of Georgia Speaker: Chang-Kuan Lin.
Transcoding based optimum quality video streaming under limited bandwidth *Michael Medagama, **Dileeka Dias, ***Shantha Fernando *Dialog-University of.
Journal of Visual Communication and Image Representation
1 Yu Liu 1, Feng Wu 2 and King Ngi Ngan 1 1 Department of Electronic Engineering, The Chinese University of Hong Kong 2 Microsoft Research Asia, Beijing,
MULTIMEDIA DATA MODELS AND AUTHORING
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Ai-Mei Huang And Truong Nguyen Image processing, 2006 IEEE international conference on Motion vector processing based on residual energy information for.
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
IMAGE PROCESSING is the use of computer algorithms to perform image process on digital images   It is used for filtering the image and editing the digital.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Quality Evaluation and Comparison of SVC Encoders
Automatic Video Shot Detection from MPEG Bit Stream
Data Compression.
H.264/SVC Video Transmission Over P2P Networks
Multimedia Content-Based Retrieval
JPEG Image Coding Standard
DIGITAL SIGNAL PROCESSING
Injong Rhee ICMCS’98 Presented by Wenyu Ren
Data Compression.
Video Summarization by Spatial-Temporal Graph Optimization
Basic Concepts of Audio Watermarking
Progressive Transmission and Rendering of Foveated Volume Data
Presenter by : Mourad RAHALI
A User Attention Based Visible Watermarking Scheme
Fast Decision of Block size, Prediction Mode and Intra Block for H
ENEE 631 Project Video Codec and Shot Segmentation
Viewport-based 360 Video Streaming:
Scalable Speech Coding for IP Networks: Beyond iLBC
Viewport-based 360 Video Streaming:
Ying Dai Faculty of software and information science,
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Govt. Polytechnic Dhangar(Fatehabad)
Fundamentals of Video Compression
Scalable light field coding using weighted binary images
Presentation transcript:

User-Oriented Approach in Spatial and Temporal Domain Video Coding 2003/12/18 Chia-Chiang Ho, Wei-Ta Chu, Chen-Hsiu Huang and Ja-Ling Wu Communication and Multimedia Laboratory Department of Computer Science and Information Engineering National Taiwan University

Introduction Video Encoding Challenges: Reducing storage or transmission bandwidth, while preserving mostly the perceived quality. Typical video encoding schemes treat different parts of the source video as equal importance. By combining user attention and foveation techniques, we develop both scalable and non-scalable coding schemes that preserves qualities as far as possible. One of the most challenging problems in video encoding is to reduce the storage and transmission bandwidth required by compressed bitstream, and preserving its perceived quality. In that, various human visual system (HVS) based approaches have been proposed to fight such challenge. In our work, we adopt two modeling systems, the user attention model and foveation model to develop a scalable coding scheme based on MPEG-4 FGS standard. The following presentation could be roughly divided into two parts: first we discuss the adopted modeling systems, and then we illustrate how the modeling systems be cooperated with the video coding schemes.

User Attention Model Attention refers to the ability of one human to focus and concentrate on some visual or auditory ‘object’. Attention can be modeled by two directions: bottom-up and top-down. Bottom-up attention models what people are attracted to see. Top-down attention was usually modeled by detecting some meaningful objects or features. (models what people are willing to see)

Foveation Model We know that the retina is responsible for detecting the light. There are two kinds of neurons : rods and cones. And cones are responsible for daylight vision. The density of cone cells is higher at the fovea and drops with increasing eccentricity (the viewing angle).

Foveation Function According to empirical experiments: Larger distance, larger regions can be foveated Larger contrast threshold, larger regions can be foveated Foveation model is defined as a function of viewing distance (D) and pixel contrast. Fovea Retina Lens D Foveation point e Foveated region

Foveation in Brief The foveation model can be regarded as a kind of region-of-interest concept. For ROI description, object segmentation techniques are widely applied. However, satisfactory results are not easy to be obtained. Foveation model implicitly alleviates the object boundary restriction, and we think it may be a compromising mechanism for object-based applications.

User-Oriented Video Coding Based on MPEG-4 FGS, foveation is exploited to perform spatially selective enhancement, and user attention model is used to facilitate temporal scalability.

Spatial Domain Approach The proposed architecture for the user-oriented video encoding First, the input raw video goes through the focus detection module, which is built based on the prescribed attention model, to find out focus points According to the information of focus points, the raw video goes through the foveation filter. All 8x8 blocks in an input frame will be filtered by DCT, foveation filter, and finally IDCT modules to get the foveated frames. The foveated frames are then sent to the video encoder for normal video encoding.

Proposed coding schemes Non-scalable Coding: With foveation model, encoders can discard unimportant visual information as much as possible. Thus, the compression gain can be increased without sacrificing perceived quality. Scalable Coding: Encoders can selectively preserve higher quality for focused regions.

Scalable Coding Foveation model based scalable coding on the base layer The difference between the original video and the foveated video is then compressed as enhancement bitstream(s). The difference between the original video and the foveated video is then compressed as enhancement bitstream. When more bandwidth is available, the streaming server can improve video quality by selectively adding enhancement layers according to the foveation model. The extra bitstream will be added only to some regions, rather than uniformly enhancing the whole frame.

User Attention based Temporal Coding According to user attention model, the saliency value of a video segment is obtained from intensity, color, motion, and face features. The segments with small saliency variations should be preserved when transmission bandwidth is not enough.

Temporal Domain Approach In our work, the saliency values of each video frame are calculated from different features. We could construct a saliency curve to illustrate the saliency variation of a video clip. Here is an example from a news video clip. The segments with high saliency variations are considered to be encoded as enhancement layer. Here we denote Pi as the value of the i-th pixel, and Score denote the saliency score of this frame. For each frame, an integrated score is calculated as follows:

Temporal Reduction Steps Quantization: quantize the saliency curve to several stages mainly according to its standard deviation. Variance Calculation: variance of the frames within window is calculated to form the basis of saliency. Scalable Coding: If the variance of video shot is larger than a pre-defined threshold, we say that it dazzled users and doesn’t possess high semantic meaning. This video segment is then encoded in the enhancement layer due to storage or transmission restriction. To meet the bandwidth limit for base layer, we design a window-based approach to adaptively skip inconspicuous fames by the following steps: This kind of video segments are first encoded as enhancement layer and enhance the whole video if more bandwidth is available.

Experimental Results: Non-scalable Spatial Coding Original D = 1, k = 2 D = 1, k = 6 We increase the minimum contrast threshold by modifying CT0 as: CT1(k)=CT0+kS And the D is the viewing distance. D = 6, k = 2 D = 6, k = 6

Non-Scalable Experimental Results Bitrate savings of applying foveation filters to various MPEG-1 encoded sequences Sequence Original bitstream size(bytes) bitrate (kbps) Foveated size (bytes) Bitrate Saving Ratio (%) foreman 1329545 831 1230589 769 7.4 mobile 3913246 2445 3326153 2078 15.0 butterfly 392447 721 374541 688 4.5 About 9% bitrate saving in average

Experimental Results: Scalable Temporal Coding In our preliminary experiments, we found that this approach provides satisfactory results in some categories of videos. For example, in a news video, the segments with smooth frames, such as the scenes of anchorperson and close-up shot are preserved to be the base layer. Other segments with frequent scene changes are encoded as enhancement layer.

Conclusion We proposed a user-oriented approach combining user attention and foveation models to facilitate scalable coding in spatial and temporal domains. This framework could be extended to develop a transcoder that selectively transcodes a part of a video frame to meet different requirements in different devices.