Interframe Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2008 Last updated
Heejune AHN: Image and Video Compressionp. 2 Agenda Interframe Coding Concept Block Matching Algorithm Fast Block Matching Algorithms Block Matching Algorithm Variations Enhanced Motion Models Implementation Cases
Heejune AHN: Image and Video Compressionp Interframe Coding Motivation Video has High Temporal Correlation between frames. Var[ X(t+1) – X(t) ] << Var[ X(t+1) ] Two successive video framesDFD (displaced Frame Difference)
Heejune AHN: Image and Video Compressionp. 4 Motion Estimation and compensation Motion estimation Find the best parameters of current frame from reference frames Motion compensation Subtracts (Add) the predicted values from current frame (to DFD frame) Reference frames Current frame MC ME Recon. Motion parameters Encode Residual Texture Info MC Reference frames Recon.
Heejune AHN: Image and Video Compressionp. 5 Performance Criteria Coding performance Residual signal has low energy (variance measure) Complexity Computational and implementation complexity Storage and Delay Number of required frames Side Information Size and complexity of motion parameters Error resilience When data is partially lost. Some factors are trade off Coding perf. against complexity, storage, side info, error resilience.
Heejune AHN: Image and Video Compressionp. 6 2D Motion x y stationary background moving object shifted object time t previous frame current frame Prediction for the luminance signal S(x,y,t) within the moving object: „Displacement vector“
Heejune AHN: Image and Video Compressionp Block Matching Algorithm BMA(Block matching algorithm) Segment frame into same rectangular Blocks 2-D linear motion (mv x, mv y ) per each block Real Motion MV X(t) X(t+1)
Heejune AHN: Image and Video Compressionp. 8 Difference Measure MSE MAE and SAE CCF (Cross Correlation Function)
Heejune AHN: Image and Video Compressionp. 9 Full Search Algorithm “Full Search” does Not means the whole frame, but whole position in limited Search Window Method Raster order or Spiral order (Figure. 6.6)
Heejune AHN: Image and Video Compressionp. 10 Full Search Complexity (2w+1) x (2w+1) points (for search window [-w, w]) NxN size Block computation int SAE(uchar *f, uchar *g, int mvx, int mvy){ for ( x=0; x< N; x++){ for ( y=0; y< N; y++){ sae += ABS(*(f + (y+mvy)*width +(x+mvx), *(g + y*width+x)); } mvx_min = mvy_min = 0; min = SAE(f, g, 0, 0); for(mvy=-w, mvy<=w, mvy++) for(mvx=-w, mvx<=w, mvx++){ sae = SAE(pre, cur, mv, mv) if(min >sae) mvx_min = mvx, mvy_min = mvy, min = sae; }
Heejune AHN: Image and Video Compressionp Fast BMAs Complexity Reduction Approaches Reduce test points Monotonic variation assumption –The closer to the optimal point, the smaller difference Change the test-point order (more like first) Binary Search than Linear Search Benefit from Early Stop of block difference calculation Reduce the computation at one point Sub-sampled value Note Trade-off! 0 -w +w
Heejune AHN: Image and Video Compressionp. 12 TSS (3-Step Search) Step 0: Search center (0,0), n = w Step 1: n = floor[ n / 2 ] Step 2: Search 8 points and find the min values Step 3: if n == 1 stop, o.w. Go to Step 1 Properties Logarithmic/Binary search (only 3 step when p = 8) Search decreasing distance w/2 => w/4 => w/8.... until 1 Complexity : O(log 2 w)
Heejune AHN: Image and Video Compressionp. 13 2D Logarithmic Search Step 0: Search center (0,0) Step 1: Search 4 points with s step size Step 2: find min, if center S = S/2, ow. move center to the min locaiton Step 3: if S = 1, go to step 4, else go to Step 1 Step 4: search the 8 neighbors, and decide min. Properties Similar to TSS, but more accurate Complexity ~ O(log 2 w) but not fixed loop count
Heejune AHN: Image and Video Compressionp. 14 Examples TSS (Tree Step Search) Logarithmic Search Cross Search One-at-a-time Search Nearest Neighbors Search From Other Source. TSS (Three Step search) TDL (Two Dim. Logarithmic) CDS (Conjugate Direction Search) CSA (Cross Search Algorithm) OSA (Orthogonal Search Algorithm)
Heejune AHN: Image and Video Compressionp. 15 Fast BMA Performance Complexity AlgorithmMaximum number of search points w 4816 FSM(2w + 1) TDL2 + 7 log 2 w TSS1 + 8 log 2 w MMEA1 + 6 log 2 w CDS3 + 2w OSA1 + 4 log 2 w91317 CSA5 + 4 log 2 w131721
Heejune AHN: Image and Video Compressionp. 16 Estimation Performance AlgorithmSplit screenTrevor white entropy (bits/pel) standard deviation entropy (bits/pel) standard deviation FSM TDL TSS MMEA CDS OSA CSA
Heejune AHN: Image and Video Compressionp. 17 Issues in Fast MC Algorithm Local Minimum Error Fast MC calculates only few of positions Many cases are not “monotonic” curves, single hill. Possibly can conclude with local minimum. See Figure
Heejune AHN: Image and Video Compressionp. 18 Hierarchical MC Reduced image Sub-sampled, filtered N levels with half resolution Search top (N) level fully reduced search window range (w/2 N-1 ) Search lower N-1 level only 9(8?) neighbor positions only
Heejune AHN: Image and Video Compressionp. 19 Benefits of hierarchical search Escape Local minimum Complexity Reduction e.g) Window = 16 full search (2 × )2 = 4225 operations HBMA with N =4, (2 × 4 + 1)^2 + 3 × 9 = 108 operations Sub-sampled signal Original signal
Heejune AHN: Image and Video Compressionp Variations of BMA: Multi-frame MC Multiple Frame MC “Forward pred” starts from H.261 “backward, bidirectional” starts from MPEG-1 “multiple reference (each MB takes its own ref picture) starts from H.264 forward bidirectional: average backward
Heejune AHN: Image and Video Compressionp Variations of BMA: Multi-frame MC Multiple Frame distance Search Range = frame difference x window Since displacement = velocity x time eg) w = 8, 64 points (1 frame diff), 256 points (2 frame diff) Practice search only [-w, w] of (mvx1, mvy1) for (mvx2, mvy2) -w +2w +w -2w t t -1 t -2 mvx1,mvy1 mvx2,mvy2
Heejune AHN: Image and Video Compressionp. 22 MV at Boundary Restriction on MV range Should inside of reference pictures In H.261/MPEG-1, MPEG-2, MPEG-4 Unrestricted MV Extrapolates (extends with same boundary pixel value) In H263 Annex D,H.264 t -w +w t -1 -w +w Extrapolated t -1
Heejune AHN: Image and Video Compressionp. 23 Sub-pixel Motion Estimation Note Object cannot happens to move integer pixels We have only integer pixel samples Sub-pixel estimation Get the fractional pel values in reference frame Normally using linear interpolation Half-pel/quarter-pel
Heejune AHN: Image and Video Compressionp Enhanced Motion Models More Motion Estimation Model Rigid 2D Translation (BMA) + Transformation Global Motion + Illumination variation + zoom-in/out Object Model + overlapping of objects + 3D Rotation + Non rigid objects (deformation) Some are from computer vision area But at present most tools are too complex for application to video coding area Some are included in MPEG-4 Part 2’s Object Oriented Coding
Heejune AHN: Image and Video Compressionp. 25 Examples Region based motion compensation How to get/describe shape and motion Global motion (picture warping) Called Camera motion Mesh-based Deformation
Heejune AHN: Image and Video Compressionp Implementation Video Encoder and Decoder Complexity Profiling
Heejune AHN: Image and Video Compressionp. 27 SW Optimization Algorithm level optimization : independent of CPU Data structure design (most modern CPU, RISC) Memory Cache optimization Current blocks into cache Loop unrolling (See Fig. 6.21) Reduce the pointer operation and jump prediction (pipelining) CPU-specifics Optimization SIMD (Single Instruction with Multiple Data) Packed Instruction (See Fig. 6.22) TI DSP, Intel MMX etc MIMD (MuParalell Processing Core) VLIW (Very Long Instruction Word) of TI DSP GPU DMA utilization Coprocessor Utilization DCT, ME, Post/Pre Processing
Heejune AHN: Image and Video Compressionp. 28 HW Optimization Criteria Performance, cycle count, gate-count, data flow Example #1: Full Search Parallelization M function block, then M Speed up Search Window Memory (DRAM/SRAM) Current MB (SRAM) SAE Comparator
Heejune AHN: Image and Video Compressionp. 29 Example #2: Fast Search TSS and Hierachical search (has fixed clock property) Pipelining blocks for speed up Search Window Memory (DRAM/SRAM) Current MB (SRAM) STEP1 (+/-4 Step 2 (+/-2) Step3 (+/-1) Step 4 (+/-1/2) t=4 block4 block 3 block 2 block 1 t= 3 block 3 block 2 block 1 t=2 block 2 block 1 t =1 block 1