Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik Dmitry Computer Vision Seminar, IDC
Motivation Recognizing human actions in a motion sequence Objects in medium field – about 30 px The objects are noisy Different view angle Non-periodic actions Moving camera Not appearance-based
Additional Applications Classification of actions Action synthesis: Do as I do Do as I say Action database (images, skeletons) Figures correction
Previous Work Large-scale objects: body parts recognition Periodic motion Stationary cameras, background subtraction Spatio-temporal gradients for video events recognition: high-resolution, different motion classes
Recognition Method Tracking a person (simple normalized correlation based tracker, user initialized ) Stabilizing figure’s center in the sequence Calculating spatio-temporal motion descriptors per each frame Measuring motion similarities in sequences
Motion Descriptors Use actual pixel values (appearance)? Use spatial gradients? Use temporal gradients? Edges? Pixel-wise optical flow Encodes motion Least affected by appearance …but noisy
Optical Flow Overview Pixel-wise stabilization of video sequence Using Lucas&Kanade registration method The images are taken from
Optical Flow Overview – cont. Per each pixel we have: Intensity: Velocity at each time point: Assuming small motion (1): (Taylor first-order approximation)
Optical Flow Overview – cont. Assuming the intensity of the moving pixel remains the same (2): Therefore:
Optical Flow Overview – cont. For all the pixels in a small block:
Optical Flow Overview – cont. Solving the equation :
Optical Flow Overview – cont. We have the motion vector per each block The images are taken from
Back to Motion Descriptors The optical flow results are noisy We would like to blur them FxFx FyFy
From Optical Flow to Descriptors Splitting the motion vectors V(X,Y) to positive and negative channels Gaussian blurring and normalizing of the four channels
Comparing Descriptors In order to compare motions, we need to compare frames of two different sequences The descriptors of all frames are compared using spatio-temporal correlation Where is the descriptor number c of frame i in sequence A Frame-to-frame similarity
Frame-to-frame Similarity We’ll start from the inner term This is the frame-to-frame similarity function Where i are indices of frames in sequence A Where j are indices of frames in sequence B
Frame-to-frame Similarity The frame-to-frame similarity matrix: Sequence A b 1 b 2 b 3 b 4 Sequence B a 1 a 2 a 3 a 4 Similar motions will appear as diagonals The motion-to-motion similarity matrix:
Motion-to-motion Similarity The similar motion patterns will appear in diagonals, or slanted diagonals In order to examine the diagonals, we will convolve the FF-similarity matrix with diagonal kernel Typical FF-similarity matrix for runningThe resulting MM-similarity matrix i j
Classifying actions Each motion in a learning sequence has a label Each row in a MM-similarity matrix represents a frame in a novel sequence Construct MM-similarity matrix for the novel sequence Look at the corresponding row Assign a label to the current frame, according to a majority vote Label 1 Current Frame Label 2 Label 3 Label 4
Classification Examples
Skeleton Transfer Hand-mark the 2D database with joint locations Perform the classification on the sequences, and classify the novel sequence to a skeleton
3D Motion Classification Render synthetic 2D images of a stick figure Perform classification of a 3D motion It has many ambiguities
Action Synthesis We can use the visual quality of motion descriptors to generate actions Collect a large database of actions of a specific person (Charlie Chaplin) Generate any action, based on the database
“Do As I Do” Synthesis We build a sequence S by picking frames from given target sequence T according to a driver sequence D S must: Match the sequence D (in terms of motion descriptors) Appear smooth and natural We will need: MM-similarity matrix between D and T: Similarity-in-appearance matrix (frame-to-frame norm. correlation) between all the frames in T:
“Do As I Do” Synthesis – cont. Match-to-driver termSmoothness term :the following frame in T after Now we will maximize a cost function: Sequence T Sequence S
“Do As I Do” Example
“Do As I Say” Synthesis Generate motion sequence by issuing commands for an action: Classification of target sequence T with the descriptors Use the same approach as in “Do as I do” algorithm Not real-time application
“Do As I Say” Example
Figure Correction Correct occlusions, background noise Find k similar frames in the same sequence The median image will be the estimate for the current frame Given enough data, the common parts in the found images will be the figure itself
Disadvantages High Complexity Scale-sensitive Unable to recognize motions with different speed
Video Examples That’s all folks…
Recognizing and Tracking Human Action (Preview) Josephine Sullivan and Stefan Carlsson Numerical Analysis and Computing Science, Royal Institute of Technology, Stockholm, Sweden Presented by Pundik Dmitry Computer Vision Seminar, IDC
Shape Correspondence Every point on the shape (contour) has location and a tangent direction Assuming correspondence and a smooth transformation between frames Each four points create a unique complex, which can help us build correspondence between points
Topological Type 1. Point order 2. Line direction order 3. Relative intersection of the lines and the points
Unique Correspondence By choosing every four points on a shape, we will detect the unique correspondence
More In The Paper… Frame distance function Key frame based action recognition Tracking by point transfer Body joint locations