Mobile Motion Tracking using Onboard Camera Lam Man Kit CEG Wong Yuk Man CEG
Outline Introduction Methods Results Future Work Q&A
Introduction Camera-phones can perform image processing tasks on the device itself. It can be used as an additional means of user input. Symbian OS make programming on mobile phone possible.
Introduction Real-World Interaction with Camera- Phones User Movement acts as an input source for the mobile phone Gesture Input “ Virtual Mouse ” New Input method for Interactive Game
Real-World Interaction
Motion Estimation Motion estimation is a process to find the motion vector of the current frame from reference frame(s). Optical flow Block Matching algorithm Block matching algorithm, is an integral part for most of the motion-compensated video coding standards. Eg MPEG 1, MPEG 2, H.263.
Block Matching Divide previous frame to small rectangular blocks Find the best match for a selected block in current frame Calculate motion vector between previous block and its counterpart in current frame Typical size for blocks: 16x16 pixels Search Range W: typically 16 or 32 pixels Similarity Measure: Mean Absolute Error (MAE) Mean Square Error (MSE) Sum of the Absolute Difference (SAD) SAD is used in our project Current frame MV
Block Matching 2W + 1 2BW + 1 2BH + 12H + 1 Search Window Block Search Window (in current frame) A Rectangle with the same center as block in previous frame, extended by w pixels in both directions BW = 1 BH = 1 W = 4 H = 4 W
Block-match Motion Estimation Two kinds of methods commonly used: Fast Search Algorithm 2-D Logarithmic Search 3-Step Search (3SS) Diamond Search Exhaustive Search Algorithm (ESA)
Fast Algorithms Fast Motion Estimation Algorithms: 22-D Logarithmic Search 33-Step Search (TSS) DDiamond Search Assumption: TThe matching error monotonically increases as the search position moves away from the optimal motion vector
Fast Algorithms - TSS Three-Step Search (TSS) 1 st Step: Search 8 surroundings and the central point Distance = w/2 pixels Find the best match 2 nd Step: Use previous best match as center Repeat 1 st step with distance = w/4 pixels 3 th Step: Repeat 1 st step with distance = w/8 pixels Searched only 25 points Center of Block Search Window 123
Fast Algorithms Advantages: Extremely Fast Disadvantages: All Fast Algorithms greatly rely on a monotonically increasing match criteria around the location of the optimal motion vector limited number of positions examined (only 25 points) inside the search window, only find suboptimal solution Easily fall into local minimum
Full Search All candidates within search window are examined (2w+1) 2 positions should be examined Advantage: Good accuracy, Finds best match Disadvantage: Large amount of computation: (2w+1) 2 matches, 16x16 MSE for each match. Impractical for real-time applications In order to avoid this complexity, we should reduce search positions Fast Block Matching Algorithms
Fast Exhaustive Block Matching Algorithms Much Faster No performance Loss Idea: excluding many search positions while finding still best match: SEA ( Successive Elimination Algorithm ) PNSA ( Progressive Norm Successive Algorithm )
SEA algorithm SAD of two blocks X and Y is defined as By Minkowski inequality Thus, By calculating the Block-Sum Difference first, we can eliminate many candidate Blocks (if D > SAD) before doing slow SAD About 2 times Faster than Exhaustive Search !! Fast Slow Denoted as D
Exhaustive Block-Matching Algorithm update …. SAD …. SAD…. Search range=2W+1 SEA Tree pruning decision PPSA The smallest SAD Total No of candidate Block: (2w+1) 2 Probability of eliminating invalid candidate block: SEA < PNSA < SAD Computation Load: SEA < PNSA < SAD
Fast Calculation of Block Sum Example Search Window 7X7 Block size 3X3 Goal Obtain block sum and store it in 7X7 2D array Question How can we obtain the block sum in a efficient way?
Fast Calculation of Block Sum Consider row of size 3X9 Sum the pixels in each column i and store the sum in norm[i] 3X3 Block sum in each position of the 7x7 2D array is found by adding 3 norms Norm [0] Norm [1] Norm [2] Norm [3] Norm [4] Norm [5] Norm [6] Norm [7] Norm [8] Block Sum [0][0]
Fast Calculation of Block Sum Next row Each norm[i] add 1 pixel below and minus 1 pixel above it Block sum in each position is again found by adding 3 nearby norm Newly added this row Deleted from pixel of this row Norm [0] Norm [1] Norm [2] Norm [3] Norm [4] Norm [5] Norm [6] Norm [7] Norm [8] Block Sum [1][0]
Fast Calculation of Block Sum Last row – the same way Advantage: No pixel is added repeatedly Calculation of block sum become faster Greatly improve the speed of SEA Newly added this row Deleted from pixel of this row Norm [0] Norm [1] Norm [2] Norm [3] Norm [4] Norm [5] Norm [6] Norm [7] Norm [8] Block Sum [6][0]
Feature Selection Which block should be chosen for tracking? Flat-colored block is not good A block in a region of repeated pattern is not good Why is the “ mouth ” a good candidate? How do we find a good feature block? Is that block good? No It is a good block !!
Feature Selection Goal: Find a good reference block for tracking Criteria: The candidate block should have great SAD with it ’ s neighbors It contains “ complex ” information Great SAD with neighbors block Prevent ambiguous detection Speed up the searching algorithm Many Candidate blocks are eliminated by the Tree in upper level Complex block Prevent choosing flat region as reference block Enhance the performance of PDE (Partial Distortion Elimination)
PDE (Partial Distortion Elimination) Simple, small overhead Comparison can be done Halfway Stop if the sub-blocks SAD between block X and Y is already larger than previous minimum SAD Removes unnecessary computations efficiently if the feature block Y has high complexity It will have great SAD with block X Increase chance of halfway stop We implement a simple feature selection algorithm base on above criteria X: candidate block Y: feature block
Feature Selection Divide current frame to small rectangular blocks For each block, sum all the pixels value, denoted as I xy (Intensity of the block) Calculate the variance of each block which represent the complexity of the block Use Laplacian Mask for each block The Laplacian operator indicates how difference the reference block is than the neighbors Flat background > small output Dissimilar with neighbors > large output Select the block which has the largest I xy and large variance as feature block Laplacian Mask
Spiral Scan Conventional Block Scanning Method When calculating SAD of two blocks, left top position is considered first, then scan row by row, until the bottom right position is considered. Just like TV scanning order. Simply to implement
Spiral Scan Proposed Block Scanning Method Observation If optimal position is reached earlier, amount of computation will be reduced. Statistically, most of the movement are stationary or around the center. That means most of the motion vectors are center-biased. Objective Search the motion vector around the center of a search window first Higher chance to meet the optimal position earlier algorithm run faster
Spiral Scan First find the SAD at the center of the search window Then find the SAD at position that are n pixels away from the center where n = [1,BW] Search Window
Spiral Scan Proposed Block Scanning Method Result Require larger memory space If fast calculation of block sum is used together, the whole block sum 2D array is needed to be stored. Speed of Algorithm significantly improved, about 2- 3 times speed up in real-time motion tracking
Adaptive Search Window Conventional method Center of the search window is the previous optimal position Search Window Block Center of Search Window
Adaptive Search Window Proposed method Center of the search window is predicted based on the previous optimal position and motion vector Example Previous motion vector is (1,0), i.e. one pixel to the right The predicted center of search window will be the next right pixel of the previous optimal position Search Window Block Center of Search Window
Adaptive Search Window Motivation To Increase the speed of fast full search algorithm by searching the most probably optimal position first Need to corporate with Spiral Scan To Increase the accuracy Why? How?
Conventional Search Window We used web camera to track the motion of an object and graph showing its x-axis velocity against time is plotted Due to the limited size of search window, if an object is moving too fast, the optimal position would fall out of the search window, serious error result |Velocity| < W pixels/s Assume the algorithm is run every second
Adaptive Search Window Based on the previous optimal position and motion vector, we estimate the next optimal position, and this will be the center of the search window E = (1-L)xE’ + LxA E: Expected Displacement E’: Current Expected Disp. L: Learning Factor A: Actual Displacement
Adaptive Search Window Applying adaptive search window method, the relative velocity fall within the range [-20,20], therefore optimal position should fall inside search window Relative velocity = actual disp. – expected disp. W: Search Range |Acceleration (relative velocity) | < W pixels/s Assume the algorithm is run every second
Table showing the time required to find the motion vector at different regions using different algorithms (Each algorithm is run 5 times) Result Time Typical Point ESA Algorithm Spiral ESA SAD Algorithm SEA PPNM PDE SAD Algorithm SEA PPNM PDE SSD Algorithm Spiral SEA PPNM PDE SSD Algorithm Adaptive Spiral SEA PPNM PDE SSD Algorithm High Gradient High Variance 1252ms270ms671ms331ms40ms30ms Low Gradient High Variance 1071ms281ms821ms130ms60ms50ms Low Gradient Low Variance 1342ms381ms1282ms791ms140ms80ms Optimal Motion Vector = (-5, 12), Previous Motion Vector = (-2, 4) Affecting Spiral Scan and Adaptive Search Window algorithm
Future Work Implement “Adaptive Spiral SEA PPNM PDE SSD” Algorithm in mobile phone Make a simple application/game using motion tracking
Q & A