Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.

Slides:

Advertisements

Similar presentations

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.

Advertisements

Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.

Dynamic Occlusion Analysis in Optical Flow Fields

Actions in video Monday, April 25 Kristen Grauman UT-Austin.

Introduction To Tracking

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Local Descriptors for Spatio-Temporal Recognition

A New Block Based Motion Estimation with True Region Motion Field Jozef Huska & Peter Kulla EUROCON 2007 The International Conference on “Computer as a.

Video Texture : Computational Photography Alexei Efros, CMU, Fall 2006 © A.A. Efros.

Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.

Optical Flow Methods 2007/8/9.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Detecting and Tracking Moving Objects for Video Surveillance Isaac Cohen and Gerard Medioni University of Southern California.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Human Action Recognition

CSSE463: Image Recognition Day 30 Due Friday – Project plan Due Friday – Project plan Evidence that you’ve tried something and what specifically you hope.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Caltech, Oct Lihi Zelnik-Manor

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.

Tracking Video Objects in Cluttered Background

Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.

COMP 290 Computer Vision - Spring Motion II - Estimation of Motion field / 3-D construction from motion Yongjik Kim.

3D Rigid/Nonrigid RegistrationRegistration 1)Known features, correspondences, transformation model – feature basedfeature based 2)Specific motion type,

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Robust estimation Problem: we want to determine the displacement (u,v) between pairs of images. We are given 100 points with a correlation score computed.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.

ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.

Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.

1 Computational Vision CSCI 363, Fall 2012 Lecture 26 Review for Exam 2.

Video Analysis Mei-Chen Yeh May 29, Outline Video representation Motion Actions in Video.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

CSSE463: Image Recognition Day 30 This week This week Today: motion vectors and tracking Today: motion vectors and tracking Friday: Project workday. First.

Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.

Computer Science Department Distribution Fields A Unifying Representation for Low-Level Vision Problems Erik Learned-Miller with Laura Sevilla Lara, Manju.

Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.

Recognizing Human Figures and Actions Greg Mori Simon Fraser University.

Activity Detection Seminar Sivan Edri.  This capability of the human vision system argues for recognition of movement directly from the motion itself,

Computer Vision, Robert Pless Lecture 11 our goal is to understand the process of multi-camera vision. Last time, we studies the “Essential” and “Fundamental”

Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.

UCF REU: Weeks 1 & 2. Gradient Code Gradient Direction of the Gradient: Calculating theta.

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #16.

Motion Analysis using Optical flow CIS750 Presentation Student: Wan Wang Prof: Longin Jan Latecki Spring 2003 CIS Dept of Temple.

Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.

1 Motion Analysis using Optical flow CIS601 Longin Jan Latecki Fall 2003 CIS Dept of Temple University.

Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

Representing Moving Images with Layers J. Y. Wang and E. H. Adelson MIT Media Lab.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Optical flow and keypoint tracking Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.

Occlusion Tracking Using Logical Models Summary. A Variational Partial Differential Equations based model is used for tracking objects under occlusions.

MOTION Model. Road Map Motion Model Non Parametric Motion Field : Algorithms 1.Optical flow field estimation. 2.Block based motion estimation. 3.Pel –recursive.

Processing Images and Video for An Impressionist Effect Automatic production of “painterly” animations from video clips. Extending existing algorithms.

Detection, Tracking and Recognition in Video Sequences Supervised By: Dr. Ofer Hadar Mr. Uri Perets Project By: Sonia KanOra Gendler Ben-Gurion University.

Motion tracking TEAM D, Project 11: Laura Gui - Timisoara Calin Garboni - Timisoara Peter Horvath - Szeged Peter Kovacs - Debrecen.

Range Imaging Through Triangulation

CSSE463: Image Recognition Day 30

CSSE463: Image Recognition Day 30

CSSE463: Image Recognition Day 30

Presentation transcript:

Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik Dmitry Computer Vision Seminar, IDC

Motivation Recognizing human actions in a motion sequence Objects in medium field – about 30 px The objects are noisy Different view angle Non-periodic actions Moving camera Not appearance-based

Additional Applications Classification of actions Action synthesis: Do as I do Do as I say Action database (images, skeletons) Figures correction

Previous Work Large-scale objects: body parts recognition Periodic motion Stationary cameras, background subtraction Spatio-temporal gradients for video events recognition: high-resolution, different motion classes

Recognition Method Tracking a person (simple normalized correlation based tracker, user initialized ) Stabilizing figure’s center in the sequence Calculating spatio-temporal motion descriptors per each frame Measuring motion similarities in sequences

Motion Descriptors Use actual pixel values (appearance)? Use spatial gradients? Use temporal gradients? Edges? Pixel-wise optical flow Encodes motion Least affected by appearance …but noisy

Optical Flow Overview Pixel-wise stabilization of video sequence Using Lucas&Kanade registration method The images are taken from

Optical Flow Overview – cont. Per each pixel we have: Intensity: Velocity at each time point: Assuming small motion (1): (Taylor first-order approximation)

Optical Flow Overview – cont. Assuming the intensity of the moving pixel remains the same (2): Therefore:

Optical Flow Overview – cont. For all the pixels in a small block:

Optical Flow Overview – cont. Solving the equation :

Optical Flow Overview – cont. We have the motion vector per each block The images are taken from

Back to Motion Descriptors The optical flow results are noisy We would like to blur them FxFx FyFy

From Optical Flow to Descriptors Splitting the motion vectors V(X,Y) to positive and negative channels Gaussian blurring and normalizing of the four channels

Comparing Descriptors In order to compare motions, we need to compare frames of two different sequences The descriptors of all frames are compared using spatio-temporal correlation Where is the descriptor number c of frame i in sequence A Frame-to-frame similarity

Frame-to-frame Similarity We’ll start from the inner term This is the frame-to-frame similarity function Where i are indices of frames in sequence A Where j are indices of frames in sequence B

Frame-to-frame Similarity The frame-to-frame similarity matrix: Sequence A b 1 b 2 b 3 b 4 Sequence B a 1 a 2 a 3 a 4 Similar motions will appear as diagonals The motion-to-motion similarity matrix:

Motion-to-motion Similarity The similar motion patterns will appear in diagonals, or slanted diagonals In order to examine the diagonals, we will convolve the FF-similarity matrix with diagonal kernel Typical FF-similarity matrix for runningThe resulting MM-similarity matrix i j

Classifying actions Each motion in a learning sequence has a label Each row in a MM-similarity matrix represents a frame in a novel sequence Construct MM-similarity matrix for the novel sequence Look at the corresponding row Assign a label to the current frame, according to a majority vote Label 1 Current Frame Label 2 Label 3 Label 4

Classification Examples

Skeleton Transfer Hand-mark the 2D database with joint locations Perform the classification on the sequences, and classify the novel sequence to a skeleton

3D Motion Classification Render synthetic 2D images of a stick figure Perform classification of a 3D motion It has many ambiguities

Action Synthesis We can use the visual quality of motion descriptors to generate actions Collect a large database of actions of a specific person (Charlie Chaplin) Generate any action, based on the database

“Do As I Do” Synthesis We build a sequence S by picking frames from given target sequence T according to a driver sequence D S must: Match the sequence D (in terms of motion descriptors) Appear smooth and natural We will need: MM-similarity matrix between D and T: Similarity-in-appearance matrix (frame-to-frame norm. correlation) between all the frames in T:

“Do As I Do” Synthesis – cont. Match-to-driver termSmoothness term :the following frame in T after Now we will maximize a cost function: Sequence T Sequence S

“Do As I Do” Example

“Do As I Say” Synthesis Generate motion sequence by issuing commands for an action: Classification of target sequence T with the descriptors Use the same approach as in “Do as I do” algorithm Not real-time application

“Do As I Say” Example

Figure Correction Correct occlusions, background noise Find k similar frames in the same sequence The median image will be the estimate for the current frame Given enough data, the common parts in the found images will be the figure itself

Disadvantages High Complexity Scale-sensitive Unable to recognize motions with different speed

Video Examples That’s all folks…

Recognizing and Tracking Human Action (Preview) Josephine Sullivan and Stefan Carlsson Numerical Analysis and Computing Science, Royal Institute of Technology, Stockholm, Sweden Presented by Pundik Dmitry Computer Vision Seminar, IDC

Shape Correspondence Every point on the shape (contour) has location and a tangent direction Assuming correspondence and a smooth transformation between frames Each four points create a unique complex, which can help us build correspondence between points

Topological Type 1. Point order 2. Line direction order 3. Relative intersection of the lines and the points

Unique Correspondence By choosing every four points on a shape, we will detect the unique correspondence

More In The Paper… Frame distance function Key frame based action recognition Tracking by point transfer Body joint locations