Topic for lecture 2 Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): –2.16Mbit/image x 30 image/s = 64.8Mbps Motion.

Topic for lecture 2 Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): –2.16Mbit/image x 30 image/s = 64.8Mbps Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit 56.6K modem => Raw download time (excl. sound and overhead) ~ 1717 hours or ~ 72 days!!!

Agenda for lecture 2 What makes video compression possible? Implementations of motion compensation –Block matching The YCbCr color representation MPEG

Video compression A sequence of images that needs to be compressed: storage and/or transmission Ignore audio as images >> audio Straight forward methods –Motion JPEG –3D DCT

Temporal redundancy Less than 10% of the pixels changes more than 1% between frames Temporal redundancy or interframe correlation Temporal redundancy > spatial redundancy Origin: slow camera- and object movements

Motion compensated coding Second generation of temporal compression method More efficient (especially with rapid changes) but also more complex: –Ok since the cost of computer power is decreasing faster than the cost of bandwidth Basic idea: only difference between two images are the moving objects (draw) Estimate the motion and simply code this information From prediction and the initial frame we can encode/decode all other frames

Practical issues Due to noise, camera movements, light changes etc. => the object and background changes => –Calculate the predicted error (difference) and code this Very hard to track and describe a general object (contour and texture) instead a block of pixels is used as ’object’ The estimated motion is represented as pure translation: no rotation and scaling –This is justified since we have high frame rates and ’slow’ changes –Denoted the displacement vector or motion vector

Procedure for motion compensated coding Image sequence => image => blocks of pixels Step 1: Motion analysis: –Estimate the motion vector of the current block, i.e. the position of the block in the previous image(s) Step 2: Prediction and differentiation –Predict how the block found in the previous image(s) will look like in the current image –Subtract the predicted block from the current block => difference Step 3: Entropy encoding of the difference and motion vector Encoded difference and motion vector video compression Step 3 we know

Motion analysis and prediction In general we seek the trajectory of a block so we can predict its current position e.g. using weights In praxis this is too complicated and instead a 0 th order predictor is applied: –Predicted block(x,y,t) = block(a,b,t-1) –MPEG uses two 0 th order predictors The only unknown issue: step 1: how do we find the block in the previous frame that best matches the block in the current frame? Three methods: –Block matching (by far the most applied method) –Pel-recursion (block = 1 pixel) –Optical flow (block = 1 pixel)

Block matching (1) Principle The displacement of the pixels in a block are assumed to have the same motion vector Search window –Maximum from frame rate and context –Usually a square region Usually p=q => square block The smaller the block size => the better prediction, but more overhead (motion vectors) Usually block size = 16 x 16

Block matching (2) Overlapping blocks improve reconstructed image quality but decrease the bit-rate –Usually non-overlapping blocks are applies Block matching via a similarity measure: –Sum of squared differences (SSD): S(u,v) = (u-v)^2 –Mean absolute differences (MAD): S(u,v) = |u-v|

Searching strategies Full search: –Finds global minimum but requires heavy processing! Only one minimum in the search region => A less computational demanding search strategy Accept a local minimum => –Larger difference but less processing Searching strategies with one (local) minimum: –Coarse-fine three-step search –2D logarithmic search –Conjugate direction search –Etc.

Coarse-fine three-step search Step 1) Test 9 points within a fixed pattern Step 2+3) Centre the pattern around the best match and change the distance within the pattern

YCbCr color representation

A camera captures color in RGB format (show) We would like a representation where the intensity and color is separated: –So we can transmit and decode both a color and gray-scale signal –[R,G,B]: [50,50,50] same color as [100,100,100] –HSI (hue-saturation-intensity) –HSI is complex to calculate so we seek a more simple rep. YUV-representation is a simple approximation: –Y = Luminance (intensity) = 0.299 R + 0.587 G + 0.114 B –The non-uniform weighting comes from the HVS –U = B – intensity = ”pure” blue color = 0.492 (B - Y) –V = R – intensity = ”pure” red color = 0.877 (R - Y) –Rough approximation but very simple to compute

YCbCr color representation (3) The HVS is more sensitive to intensity (Y) than to color (Cb and Cr) so more bits can be used to represent the intensity Formats: 1 2 3 4 1 2 3 4 1 2 3 4 = Y sample= Cb and Cr sample 4:4:4 (24 bits)4:2:2 (16 bits)4:2:0 (12 bits)

MPEG MPEG = Moving pictures experts group International standard for compression of video (image, sound, and system info.), due to grows in the digital media (e.g. CD-rom, DVD) market. Both transmission and storage MPEG-1: 1991 MPEG-2: 1994 –MPEG-2 is MPEG-1 compatible, hence only MPEG-2 used today MPEG is NOT an algorithm but rather a framework with several algorithms and MANY user-settings. –Fixed protocol, hence fixed decoders (encoder not specified! ) –Asymmetrical codec ~ 100:1 ( JPEG ~1:1 ) MPEG is a lossy compression algorithm

MPEG-1 MPEG-2 is an ”add-on” to MPEG-1 Typical bit rate for MPEG-1 = 1.5Mbps –Meaning that an MPEG-1 decoder can decode and show real-time video that has been compressed to 1.5Mbps. MPEG: Trade off between video quality and bandwidth Allows resolutions up to 4095 x 4095 at 60Hz –Most used is the CPB (constrained parameter bit steam) Fixed resolutions and frame rates => HW implementations Max. resolution = 768 x 576 at 30Hz Max. bit rate = 1.856Mbps

MPEG-1 compression rate BT.601 (digital TV-signal): 704 x 576 x 24bit x 25Hz = 243Mbps Compression factor: 243Mbps / 1.5Mbps = 162 JPEG = 10-20 YCrCb 4:2:0 format: 12 bit per pixel Basic operation: down-scale to SIF (source input format) –Fixed resolution => HW solutions –360 x 288 (ignore lines and/or interpolate) 360 x 288 x 12 x 25Hz = 30.4Mbps => comp. factor = 20 But can be higher or lower In general: Fewer input data => better image quality (for fixed bit rate)

MPEG-1 principle (1) Full-motion-compensated DCT and difference coding Frames: 1,2,3,4,5,6,7,8,9, … 1: (DCT-JPEG) 2,3,4,5,6,7,8,9, … : difference coding –The difference is DCT coded and quantized => loosy compression –Problems? – Error propagation – No random access

MPEG-1 principle (2) I-picture: intra-coded –Similar to JPEG P-picture: predictive coded via forward prediction B-picture: predictive coded via: –forward-, backward-, or bi-directional prediction Errors in I and P are limited to max one GOP (group of pixels) Errors in B are limited to one picture High N and M => good coding but error propagation. –Usually: 13<N<16 and 0<M<4 –Recommended: I each ½ sec. and whenever scene changes Coding order vs. visualisation order

Entire sequence 16 Y 8 8 CbCb 8 8 CrCr 8 8 4:2:0-format 6 Blocks Type: I,P,B MB = Macro Block

Coding one Block (8x8) Similar to JPEG except for adaptive quantization –DCT, quantization, zig-zag scan, entropy coding –Adaptive quantization controls the quality/amount of data –Intra vs. Inter coding: I-blocks: Intra P,B-blocks: Depending on DIFF: 0, motion vectors, Inter, Intra.

Coding one Block (8x8) Encoding Decoding

What to remember Video compression is done by removing the temporal redundancy Principle: (at block level) –Step 1: Motion analysis => motion vector –Step 2: Calculate the error/difference (subtraction) –Step 3: Entropy encoding of motion vector and difference Motion analysis: –Pel-recursion –Optical flow –Block matching (the currently applied method) Block matching –Block of pixels (16 x 16) –Similarity measure –Search region –Different search strategies to avoid the full search

What to remember Video compression is done by removing the temporal redundancy Principle: (at (macro)block level) –Step 1: Motion analysis (block matching) => motion vector –Step 2: Calculate the error/difference (subtraction) –Step 3: ’JPEG’-coding (DCT, quantization and entropy encoding) MPEG-1: –Bit rate ~1.5Mbps –Asymmetrical codec ~ 100:1 ( JPEG ~1:1 ) –Compression rate ~20) –Coding-style: I B B P B B P B B I Questions? Presentations: email me tbm@cvmt.dk The end

Pel-recursion (1) The block consists of only one pixel (= pel) Problem formulation: –Displaced frame difference function: –DFD(x,y,dx,dy) = i(x,y,t) – i(x-dx,y-dy,t-1) –Find (dx,dy) which minimises DFD^2 => most similar pixel => best displacement vector Solution: –Setting the partial derivatives = 0 –Non-linear programming problem: Iterative algorithm Steepest decent method Newton-Raphson’s method others

Pel-recursion (2) Algorithm: Find the motion vector (dx,dy) for the first pixel The motion vectors are correlated => –Use ’old’ (dx,dy) as initial guess for the iterative algorithm => recursion

Optical flow The block consists of only one pixel Similar to Pel-recursive but calculated in a different manner

Comparing the 3 types of motion analysis The three: pel-recursion, optical flow and block matching Optical flow and pel-recursion calculated one motion vector for each pixel => –More precise => predicted block and current block are more similar => smaller difference => more compact coding of the difference. –More overhead as more motion vectors are to be coded –More complex to calculate –Pixel methods avoid the block artefacts of block matching Block matching is (at present) more suitable –Used in all coding standards

Temporal methods Two methods which exploit both the spatial and temporal redundancies –Frame replenishment –Motion compensation Both utilise prediction => short summery

Frame replenishment (1) Exploit the temporal redundancy First generation of temporal compression method If: value changed significantly: | i(x,y,t) – i(x,y,t-1) | > TH Then: code value and position: i(x,y,t) x,y Else: code nothing => re-use i(x,y,t-1) Enhancements: –Send differences instead of values –Remove noise from the images prior to processing

Frame replenishment (2) A fixed bit rate of 1Mbps means that the decoder can only decode and play-back real-time video compressed to 1Mbps Many changes between two images => many pixels to be coded. To achieve the same bit rate => TH is higher => only large changes are coded => poorer reconstruction aka. the dirty window effect

2D logarithmic search Test 5 points within a fixed pattern Centre the pattern around the best match When best match is in the centre or on the border: reduce distance in pattern

Conjugate direction search Step 1: Test 3 vertical points next to each other Step 2: Move to minimum point Continue step 1 and 2 until a minimum is found. Then repeat the process in the vertical direction

YCbCr color representation (2) YUV-representation can have negative values, so YUV-representation is scaled and shifted to avoid this => YCbCr-representation Cb and Cr are denoted the chrominances YCbCr is the representation utilised in image/video compression YUVYUV 0.299 0.587 0.114 -0.147 -0.289 0.436 0.615 -0.515 -0.100 = RGBRGB Y Cb Cr 0.257 0.504 0.098 -0.148 -0.291 0.439 0.439 -0.368 -0.071 = RGBRGB + 16 128

Audio in MPEG-1 16 bit sampled at: 16, 22.05, 24, 32, 44.1 and 48Kbps Stereo at 44.1Kbps = 1.4Mbps Compression based on psycho-acoustic redundancy: Three methods: –Layer 1: Target rate = 384Kbps –Layer 2: Target rate = 256Kbps –Layer 3: Target rate = 128Kbps Layer 3 is the most advanced and often applied –It has a nickname, which? dB Hz dB Hz

MPEG-2 Defined in 1994 Developed for DTV but has lots of other applications Based on MPEG-1 (backward compatible) Bit rates: 1.5Mbps – 60Mbps. Target: 2-15Mbps (best: 4) Lots of new features including: –Support for fields, support for 4:4:4 and 4:2:2 –Alternative zig-zag scan, better motion vectors –Scalability to allow any subset of a stream to be decoded and visualised, etc. MPEG-3: Purpose: HDTV –Merged with MPEG-2 => no MPEG-3 standard

MPEG-4 Both for real video and synthetic video Very low bit rates efficient coding Content based coding: code the objects –Shape, texture and sprite (background objects) Interactivity Popular coding standards:

Topic for lecture 2 Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): –2.16Mbit/image x 30 image/s = 64.8Mbps Motion.

Similar presentations

Presentation on theme: "Topic for lecture 2 Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): –2.16Mbit/image x 30 image/s = 64.8Mbps Motion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topic for lecture 2 Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): –2.16Mbit/image x 30 image/s = 64.8Mbps Motion.

Similar presentations

Presentation on theme: "Topic for lecture 2 Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): –2.16Mbit/image x 30 image/s = 64.8Mbps Motion."— Presentation transcript:

Similar presentations

About project

Feedback