ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) 11/28 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television
Two Images http://www.tomography.manchester.ac.uk/whattom.shtml
Motion Picture Television Digital Video Broadcast Television (analog) why invent new technology? movie at home, mass market influence of movie on development Key Steps convert pictures to electric signal send electric signal convert electric signal to picture Comparison with motion picture High Definition Television - analog digital, compression Video telephone - analog predecessor Video conference - travel cost, people cost Cable (narrowcast), satellite, interactive, ...
NTSC (National Television Systems Committee) 525 lines 2 dots less than 1/2000 of distance from eye are not separated (merge into one) Assume view at distance 4 times the screen height. No need to have more than 500 lines NTSC set 525 lines (475 active) Movies in 1940 has 4:3 aspect ratio (width to height) 25 or more pictures per second to see continuous motion 50 or more pictures per second to avoid flicker movies use 24 frames/sec, each shown twice 30 frames/sec with 2:1 interlace (60 even-odd fields/sec)
Bandwidth of Broadcast Television Without interlace (progressive scan), 60 frames/sec 500 lines alternating black and white gives 250 full cycles each horizontal line has 250 x 4/3 ~ 350 full cycles 60 (frames/sec) x 500 (line) x 350 = 10 MHz (video ONLY) With 2:1 interlace, 5 MHz for video FCC assigns 6 MHz per broadcast channel real usable bandwidth is less, MUCH less actual resolvable lines per vertical height ~250 Color insertion - must compatible with B/W receiver Change R-G-B to Y-Cb-Cr Y is luminance (brightness), Cb and Cr are chrominances B/W sets converts Y to picture, color sets converts Y-Cb-Cr to R-G-B then display
Digital Video What drives digital video? R-G-B component video Information technology: electronics, communication, storage, new functionality, … HDTV R-G-B component video 640 x 480 (pixel) x 3 (color) x 8 (bits/color) x 30 = 221 Mb/sec Y-Cb-Cr with subsampled Cb and Cr 640 x 480 (pixel) x 1.5 (color) x 8 (bits/color) x 30 = 110 Mb/sec Compression - MPEG (motion picture expert group) MPEG-1: CD-ROM, 1.5Mb/sec, 1.2Mb/sec for video, 352x240 (CIF), progressive scan, motion compensation MPEG-2: extension of MPEG-1, interlace, HD MPEG-4: object/region based H.2xx
Video Coding Video consists of frames In(i,j) Code each frame as a still picture – motion JPEG Each frame is close to the previous frame Code the difference FDn(i,j) = In(i,j) – In-1(i,j) Differential coding (DPCM, predictive coding) ( In-1(i,j) is the predicted value of In(i,j) ) Need to code the first frame
Encoding Three Frame Types B P Differential encoding of video I – Intra Frame, code by itself P – Prediction Frame, code by referring to previous I or P frame B – Bi-direction Frame, code by referring to forward AND backward I or P frames
Coding of I-frame – same as still image
I Frames I frames are Intra-coded using the JPEG coded I frames can be decoded without reference to other frames of the video. Sometimes called anchor frames Frame 31 A group of pictures (GOP) begins with an I-frame and ends before the next I-frame A typical GOP length is 15 frames With only 1 I-frame per GOP (the first frame) I frame: JPEG
Coding P Frames Each frame is close to the previous frame Occlusion Code frame difference (differential coding – DPCM) current frame In frame difference In - In-1 Occlusion parts of current frame is blocked in previous frame need future frame to “predict” FDn(i,j) = In(i,j) – In+1(i,j)
Coding P Frames Each frame is close to the previous frame Code frame difference (differential coding – DPCM) current frame In frame difference In - In-1
Coding of P Frames Video consists of frames In(i,j) Code each frame as a still picture – motion JPEG Each frame is close to the previous frame Code the difference FDn(i,j) = In(i,j) – In-1(i,j) Differential coding (predictive coding) In-1(i,j) is the predicted value of In(i,j) Observe: Most part of frame is unchanged Except for moving objects Motion Compensated Coding MPEG
Motion Compensated Video Coding previous frame current frame Observe: Most of picture remains unchanged But some objects have moved. So code Displaced Frame Difference Motion Compensated Coding
Displaced Frame Difference
Displaced Frame Difference
P Frames P frames are coded using two methods: Divide P-frame into Macro-blocks MB ~16x16 P frames are coded using two methods: - block motion compensation + error coding - jpeg (intra-coded), without referring to previous frames P frames are also anchor frames Frame 31 Frame 34 I frame: JPEG P frame: motion compensated. macro-blocks and macro-block motion vectors are indicated
Finding Motion Vectors anchor frame current frame Matching a block from current frame with a displaced block in reference frame using: (a) sum of squared difference (SSD), or (b) sum of absolute difference (SAD) (almost always used) The displacement giving best match is the motion vector of the block Search methods: Global search over the entire anchor frame Restricted search over local neighborhood Fast search – over a selected neighborhood,
Illustration: P-frame Macro-Blocks Frame 34 P-frame
MPEG: I and P frames (anchor frames)
Block Matching Motion Estimation current frame previous frame Blocks of size MxN Block for which motion vector to be determined a position for comparison another position
Motion Compensated Encoding of P Frame current frame Y previous frame
Coding of P frame Encoder contains decoder reconstructed previous frame Encoder contains decoder
More Detail
Need for Bi-directional Encoding P
Bidirectional Encoding
Frame Transmit Order vs Viewing Order View order Decode order = transmit order
B-frames B-frames are coded in the same way as P-frames except that for each macro-block, search for the best matching block in both the preceding and succeeding anchor frames. Use the encoding that requires the fewest bits. Called bidirectional encoding.
Block Matching Motion Estimation current frame previous frame Blocks of size MxN Block for which motion vector to be determined a position for comparison another position
Complexity of Exhaustive Block-Matching Assumptions Block size NxN and image size S=M1xM2 Search step size is 1 pixel ~ “integer-pixel accuracy” Search range +/–R pixels both horizontally and vertically Computation complexity # Candidate matching blocks = (2R+1)2 # Operations for computing MAD for one block ~ O(N2) # Operations for MV estimation per block ~ O((2R+1)2 N2) # Blocks = S / N2 Total # operations for entire frame ~ O((2R+1)2 S) i.e., overall computation load is independent of block size! E.g., M=512, N=16, R=16, 30fps => On the order of 8.55 x 109 operations per second! Was difficult for real time estimation, but possible with parallel hardware UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) - smaller blocks will allow better exploration of parallelism.
Exhaustive Search: Cons and Pros Guaranteed optimality within search range and motion model Cons Motion vectors are integer valued High computation complexity On the order of [search-range-size * image-size] for 1-pixel step size How to improve accuracy? Half pixel – significantly improvement Quarter pixel – some improvement Requires interpolation How to improve speed? Fast search Try to exclude unlikely candidates UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Half pixel resolution in matching b c d p B
Fast Algorithm: 3-Step Search motion vector {dx, dy} = {1, 6} Fast Algorithm: 3-Step Search dx dy Search candidates at 9 positions Reduce step-size after each iteration Start with step size approx. half of max. search range UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Total number of computations: 9 + 82 = 25 (3-step) (2R+1)2 = 169 (full search) Ask questions: what is the underlying/implicit assumptions on the frame images we made here in order for 3-step search can give good approx to the optimal match within search range (Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)
Hierarchical Block Matching Problem with fast search at full resolution Small mis-alignment may give large displacement error esp. for texture and edge blocks Hierarchical (multi-resolution) block matching Match with coarse resolution to narrow down search range Match with high resolution to refine motion estimation UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Lowest resolution Lower resolution Original resolution (From Wang’s Preprint Fig.6.19)
Pixel Decimation a block in current frame part of a block in reference frame IEEE Trans. on Video Technology, April 1993, pp. 148- 157.
Pixel Decimation
Subsampled Motion Field
Subsampled Motion Field
What else can you do with MPEG video? The MPEG encoder-decoder is asymmetric. Encoder is much more complex than the decoder. Determining motion vectors is a major task Decoding is easy and fast. The encoding only has to be done once, the decoding will be done many times or at many locations. Symmetric application? Compression loses information. But compressed video has information not readily available in original video