Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore.

Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore

 New paradigm for video compression  Based on results from Information Theory proposed in 1970s  A radical departure from traditional video compression techniques  Well suited for applications which require many encoders and a single decoder

 Low-complexity, low-power encoder  Possibly higher complexity decoder  Should achieve coding efficiency similar to that of conventional video compression techniques  Should try to achieve the Rate-Distortion performance of conventional schemes

 Number of encoders usually much higher than number of decoders (usually one)  Partially overlapping areas in multiple video sequences  Should exploit correlation between multiple encoded video sequences at the decoder  Low-complexity encoders required  Decoder may be of higher complexity Wireless low-power surveillance networks

Wireless Mobile Video

 Both encoder and decoder must be of low cost and low complexity  Encoder must be Wyner-Ziv for low complexity  Decoder must be MPEG-x or H.26x for low complexity  A base station receives Wyner-Ziv encoded bitstream from transmitter, decodes it, re-encodes it as MPEG-x or H.26x and transmits to receiver Wireless Mobile Video

Multi-view Acquisition

 Neighbouring cameras of a large camera-array capture overlapping, and hence, correlated video sequences  Independent encoding of videos in individual cameras  Joint decoding at a central station – must exploit correlation between different views  Used for image-based rendering (3D reconstruction with texture-mapping) Multi-view Acquisition

 Applications like tracking a person throughout an environment, monitoring of activities, tracking events and creating alarms  Multiple sensors with video-acquisition capabilities – must be of low-cost, low-power and low- complexity  Central decoding device with high computational capabilities and storage Video-based Sensor Networks

 Entropy: where,  Joint Entropy:  Conditional Entropy: Information Theory Fundamentals

 Lower bound on the bitrate of signals:  Lower bound on total bitrate: Information Theory Fundamentals

Slepian-Wolf Theorem

 Consider two statistically dependent sequences X and Y separately encoded but jointly decoded  Is it possible to recover these dependent sequences with arbitrarily low reconstruction error probability?  In 1973, Slepian and Wolf determined possible rate combinations of R X and R Y for reconstruction of X and Y with an arbitrarily small error probability  These bounds are given by the conditional entropies of the signals X and Y, and their joint entropy Slepian-Wolf Theorem

 The bounds on the rates are determined to be Slepian-Wolf Theorem

 Even when encoding of correlated sources performed independently, a total bitrate equal to the joint entropy is enough  Theoretically, separate encoding in distributed video coding schemes does not need to have any loss in compression efficiency compared to conventional video coding techniques  Defines an achievable rate region for reconstruction of dependent sequences with arbitrarily small probability of error Slepian-Wolf Theorem

Wyner-Ziv Theorem

 In 1976, Wyner and Ziv studied a special case of Slepian-Wolf coding corresponding to the rate point  Deals with source coding of a sequence X considering the sequence Y (known as side information) to be available at the decoder  Known as lossy compression with decoder side information Wyner-Ziv Theorem

 Source values X encoded without access to side information Y  Decoder has access to Y and obtains a reconstruction of the source values  Distortion is acceptable  Wyner-Ziv Rate-Distortion function is the achievable lower bound for the bitrate for a distortion D Wyner-Ziv Theorem

 Mathematically, where is the minimum rate necessary to encode X when Y is available at the encoder i.e. statistical dependency between X and Y is utilized while encoding X, for an average distortion D. Wyner-Ziv Theorem

 Note that for no distortion i.e. D=0, we get the same result as Slepian-Wolf Theorem i.e.  Inequality of Wyner-Ziv Theorem reduces to equality for Gaussian memory-less sources and mean squared error distortion function i.e. Wyner-Ziv Theorem

 In 1996, Zamir proved that for general statistics and mean-squared error distortion function, the rate loss is less than 0.5 bits/sample i.e.  Combining with the Wyner-Ziv Theorem, we have Wyner-Ziv Theorem

 The term ‘distributed’ refers to the encoding operation mode and not location  Coding of two or more dependent sources in an independent way i.e. associating a separate independent encoder to each source  Independent bitstream sent from each encoder – signals are encoded without exploiting the correlation between them  A single decoder performs joint decoding of all received bitstreams using statistical dependencies between them

Pixel-domain Codec

 Quantizer divides signal space into cells  May consist of non-contiguous sub-cells mapped into same quantizer index Q  Practical implementations of Lloyd Algorithm for optimal vector quantizers lack in performance or are prohibitively complex  Unfortunately, code cell contiguity precludes optimality of quantizers in general Quantization & Dequantization

 Introduction of a rate measure that depends on both quantization index and side information divorces dimensionality of the quantizer from block length of Slepian-Wolf coder – fundamental requirement for practical system design  At high rates and certain other conditions, lattice quantizers are optimal for Wyner-Ziv Coding  Disconnected quantization cells need not be mapped into the same index  Asymptotically, there is no performance loss by not having access to the side information at the encoder Quantization & Dequantization

 Unconventional video coding system  Encodes individual frames independently, but decodes them conditionally  Only intra-frame processing required at encoder  Inter-frame processing only at decoder Slepian-Wolf Encoder & Decoder

 Previously decoded frames used as side information for decoding a Wyner-Ziv coded frame  Performance closer to conventional inter-frame coding (MPEG) than conventional intra-frame coding (Motion-JPEG)  Encoding may be in pixel domain or transform domain Slepian-Wolf Encoder & Decoder

Slepian-Wolf codec can be implemented using any of the following:  DISCUS (DIstributed Source Coding Using Syndromes)  Turbo codes, like RCPT (Rate-Compatible Punctured Turbo code)  LDPC (Low-Density Parity-Check) codes  IRA (Irregular-Repeat-Accumulate) codes Slepian-Wolf Encoder & Decoder

Pixel-domain Codec using RCPT

 A subset of frames, regularly spaced in the video sequence, selected as keyframes, K  Keyframes are encoded and decoded using conventional intraframe 8x8 Discrete Cosine Transform (8x8 DCT)  Frames between keyframes are called “Wyner-Ziv frames”  Wyner-Ziv frames are intraframe-encoded but interframe-decoded Pixel-domain Codec

 For each Wyner-Ziv frame, S, each pixel value is uniformly quantized with intervals  Subtractive dithering done to avoid contouring and improve subjective quality of reconstructed image  Sufficiently large block of quantizer indices q provided to Slepian-Wolf encoder Pixel-domain Codec

 RCPT provides rate-flexibility  Rate adapts to changing statistics between side information and frame to be encoded  In this system, rate of RCPT is chosen by decoder and relayed to encoder through feedback  For each Wyner-Ziv frame, decoder generates side information,, by using previously decoded keyframes, and possibly previously decoded Wyner-Ziv frames Pixel-domain Codec using RCPT

 To exploit side information, decoder assumes a statistical model of the ‘correlation channel’  Laplacian distribution of difference between individual pixel values S and is assumed  Decoder estimates parameter of Laplacian distribution by observing the statistics from previously decoded frames Pixel-domain Codec using RCPT

 Turbo decoder combines side information and received parity bits to recover symbol stream  If decoder cannot reliably decode original symbols, it requests additional parity bits from encoder buffer through feedback  This “request-and-decode” process is repeated until an acceptable probability of symbol reconstruction error is achieved Pixel-domain Codec using RCPT

 Using side information, decoder predicts the quantization bin q  For this, decoder needs to request bits to establish which of the bins a pixel belongs to  With calculated values of and, decoder calculates MMSE reconstruction of the original frame, S Pixel-domain Codec using RCPT

 If side information is within reconstructed bin, then reconstructed pixel takes a value close to side information value  Otherwise, is outside and the reconstruction function forces to lie within the bin  Magnitude of reconstruction error limited to a maximum value determined by quantizer coarseness – perceptually desirable property since it eliminates large errors, which might me annoying to the viewer Pixel-domain Codec using RCPT

 Compared to conventional motion-compensated coding, pixel-domain WZ coding is much less complex  Motion estimation, prediction and DCT not required for encoding of WZ frames  Slepian-Wolf encoder requires two feedback shift registers and an interleaver Pixel-domain Codec using RCPT

Transform-domain Codec

 Block-wise DCT is applied to WZ frame W in the encoder to generate transformed signal X  Transform coefficients are grouped together to form coefficient bands, where k denotes the coefficient number  Each transform coefficient band is then encoded independently Transform-domain Encoding using RCPT

 For each, coefficients are quantized using uniform scalar quantizer with levels  Quantized symbols, are converted to fixed- length binary codewords  Corresponding bitplanes are blocked together forming bit-plane vectors  Each bit-plane vector coded by Slepian-Wolf encoder Transform-domain Encoding using RCPT

 Slepian-Wolf coder is implemented using RCPT  RCPT, combined with feedback, provides rate flexibility which is essential in adapting to changing statistics between side information and frame to be encoded  Parity bits produced by turbo encoder are stored in a buffer  Buffer transmits a subset of these parity bits to decoder on request Transform-domain Encoding using RCPT

 Decoder takes previously reconstructed frames to form side information, an estimate of W  Block-wise DCT of is taken to generate  Transform coefficients from are grouped together to form coefficient bands (side information corresponding to )  To be able to use at turbo decoder and reconstruction block, a statistical dependency model is assumed between and Transform-domain Decoding using RCPT

 Given a coefficient band, the turbo decoder successively decodes bit-planes starting from most significant bit-plane  Decoder uses received subset of parity bits corresponding to that bit-plane and side- information to decode current bit-plane  If decoder cannot reliably decode the bits, it requests additional parity bits from the encoder buffer through feedback Transform-domain Decoding using RCPT

 This “request-and-decode” process continues until an acceptable probability of reconstruction error is achieved  Probabilities generated for current bit-plane are used for decoding lower significance bit-planes  By using side information and successively decoding bitplanes, decoder needs to request bits to decode which of the bins a transform coefficient belongs to Transform-domain Decoding using RCPT

 When all bitplanes are decoded, bits are regrouped and the quantized symbol stream is reconstructed as  Reconstructed coefficient band is calculated as  Assuming is error free, this reconstruction function bounds magnitude of reconstruction distortion to a maximum value depending on quantizer coarseness Transform-domain Decoding using RCPT

 This property is desirable since it eliminates large positive or negative errors for a given transform coefficient  Fewer errors are perceptible to the viewer and subjective quality of reconstructed video is improved  Finally reconstructed WZ frame is generated by taking IDCT of the reconstructed coefficient bands Transform-domain Decoding using RCPT

 Motion compensated side information is generated at the decoder  As a result, decoders are more complex than encoders  Here we consider every odd frame to be a keyframe and every even frame to be a WZ frame  Frames may or may not be decoded in their actual sequence (similar to conventional video coding techniques) Motion-Compensated Side Information

 Side information for a WZ frame at time index t is generated by motion-compensated interpolation using decoded keyframes at time and  Involves symmetrical bi-directional block matching, smoothness constraints for estimated motion and overlapped block motion compensation  Since next keyframe is needed for interpolation, frames are decoded out-of-order (similar to B frames in predictive video coding) Motion-compensated Interpolation (MC-I)

 To generate side information for WZ frame at time index t, we estimate motion between previously decoded WZ frame at time and previously decoded keyframe at time using block matching and a smoothness constraint  Estimated motion is extrapolated to time t and side information is generated by performing overlapped motion compensation using pixel values from previous key frame Motion-compensated Extrapolation (MC-E)

 Since a previously decoded WZ frame is used for motion estimation, reconstruction errors from all the previously decoded WZ frames can accumulate and degrade the reliability of motion compensation  Unlike MC-I, all frames can be decoded sequentially Motion-compensated Extrapolation (MC-E)

 Simplified interpolation or extrapolation scheme to reduce decoder complexity at the expense of reduced compression efficiency 1. Average Interpolation (Ave-I): Side information for WZ frame is generated by averaging pixel values from keyframes at and 2. Previous Frame Extrapolation (Prev-E): Previous keyframe is used directly as side information Low-complexity side information

B. Girod, A. Aaron, S. Rane, D. Rebollo-Monedero, “Distributed Video Coding” (Invited Paper), Proc. IEEE Special Issue on Advances in Video Coding and Delivery, 2005 Catarina Isabel Carvalheiro Brites, “Advances on Distributed Video Coding”, MSc. Thesis, Technical University of Lisbon, Institute of Superior Technology A. Aaron, R.Zhang, B. Girod, “Wyner-Ziv Coding of Motion Video”, Asilomar Conference on Signals and Systems, Pacific Grove, CA, Nov 2002 A. Aaron, E. Setton, B.Girod, “Towards practical Wyner-Ziv Coding of Video”, Proc. IEEE International Conference on Image Processing, Barcelona, Spain, Sept 2003 A. Aaron, S. Rane, E. Setton, B. Girod, “Transform-domain Wyner-Ziv Codec for Video” in Proc. SPIE Visual Communications and Image Processing, San Jose, CA, Jan. 2004

Thank You

Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore.

Similar presentations

Presentation on theme: "Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore.

Similar presentations

Presentation on theme: "Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore."— Presentation transcript:

Similar presentations

About project

Feedback