Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Slides:

Advertisements

Similar presentations

Shape Context and Chamfer Matching in Cluttered Scenes

Advertisements

Active Appearance Models

Word Spotting DTW.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University.

IIIT Hyderabad Pose Invariant Palmprint Recognition Chhaya Methani and Anoop Namboodiri Centre for Visual Information Technology IIIT, Hyderabad, INDIA.

Forward-Backward Correlation for Template-Based Tracking Xiao Wang ECE Dept. Clemson University.

Computer Vision Optical Flow

Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.

A New Block Based Motion Estimation with True Region Motion Field Jozef Huska & Peter Kulla EUROCON 2007 The International Conference on “Computer as a.

Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.

Towards modelling the Semantics of Natural Human Body Movement Hayley Hung A method of feature extraction using.

A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Department of Electrical Engineering and Computer Science The University.

Probabilistic video stabilization using Kalman filtering and mosaicking.

Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Independent Motion Estimation Luv Kohli COMP Multiple View Geometry May 7, 2003.

Learning the space of time warping functions for Activity Recognition Function-Space of an Activity Ashok Veeraraghavan Rama Chellappa Amit K. Roy-Chowdhury.

COMP 290 Computer Vision - Spring Motion II - Estimation of Motion field / 3-D construction from motion Yongjik Kim.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Dynamic Time Warping Applications and Derivation

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Adaptive Signal Processing

EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Image recognition using analysis of the frequency domain features 1.

1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.

Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.

Miguel Reyes 1,2, Gabriel Dominguez 2, Sergio Escalera 1,2 Computer Vision Center (CVC) 1, University of Barcelona (UB) 2

Linearizing (assuming small (u,v)): Brightness Constancy Equation: The Brightness Constraint Where:),(),(yxJyxII t  Each pixel provides 1 equation in.

The Brightness Constraint

S DTW: COMPUTING DTW DISTANCES USING LOCALLY RELEVANT CONSTRAINTS BASED ON SALIENT FEATURE ALIGNMENTS K. Selçuk Candan Arizona State University Maria Luisa.

Y. Moses 11 Combining Photometric and Geometric Constraints Yael Moses IDC, Herzliya Joint work with Ilan Shimshoni and Michael Lindenbaum, the Technion.

K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini

Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.

Kevin Cherry Robert Firth Manohar Karki. Accurate detection of moving objects within scenes with dynamic background, in scenarios where the camera is.

Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.

Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.

CS332 Visual Processing Department of Computer Science Wellesley College Analysis of Motion Measuring image motion.

Communication Systems Group Technische Universität Berlin S. Knorr A Geometric Segmentation Approach for the 3D Reconstruction of Dynamic Scenes in 2D.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Human Re-identification by Matching Compositional Template with Cluster Sampling Yuanlu Xu 1, Liang Lin 1, Wei-Shi Zheng 1, Xiaobai Liu 2 Abstract This.

1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.

1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.

AAM based Face Tracking with Temporal Matching and Face Segmentation Mingcai Zhou 1 、 Lin Liang 2 、 Jian Sun 2 、 Yangsheng Wang 1 1 Institute of Automation.

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Optical flow and keypoint tracking Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.

Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

Robust Visual Motion Analysis: Piecewise-Smooth Optical Flow

Real-Time Human Pose Recognition in Parts from Single Depth Image

Dynamical Statistical Shape Priors for Level Set Based Tracking

Structure from motion Input: Output: (Tomasi and Kanade)

The Brightness Constraint

Filtering Things to take away from this lecture An image as a function

Structure from motion Input: Output: (Tomasi and Kanade)

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction Motivation : Hand signals are commonly used for communication in noisy environments or when people are out of voice range. Examples include directing an airplane to the runway for take off, controlling traffic flow, basketball referee signals, etc. Computer Science Figure 1: Basketball Referee Hand Signal 2. Algorithm Overview : 2D vs 3D sequence alignment using Dynamic Time Warping Assumptions : We focus on the recognition part of the algorithm, thus we assume that the video sequence has been temporally segmented and the desired 2D feature locations can be reliably tracked over the whole sequence. Within each sequence of 2D features, we further assume that there is only one hand signal. Problem Definition : Given a sequence of tracked 2D feature locations, find the best matching 3D motion capture sequence from an archive. Table 1 : Confusion Matrix. Each row contains outcome of classifying queries drawn from the same category. Diagonal entries represent correct classification. Dissimilarity score : Given at least six pairs of 2D to 3D correspondences in a frame, the projection matrix M can be estimated. Given M, the back-projection error of the 3D points is used as the dissimilarity score. 2D vs 3D alignment : Once we are able to compute the dissimilarity between a frame of 2D and 3D features, the Dynamic Time Warping (DTW) algorithm can proceed as usual. The DTW algorithm finds the optimal alignment by minimizing the dissimilarity cost. Figure 2: An example of a DTW matching 2D feature locations in image sequence 3D feature locations in motion capture sequence Equation 2: Recursive solution for the DTW alignment Classifier : Experiments are conducted using the nearest neighbor classifier. Hence given a sequence of 2D features, the 3D motion sequence with the lowest alignment score is deemed the best match. Data : 45 motion capture sequences of basketball referee gestures: 2D image features were synthesized from the 3D motion capture sequences using a frontal view and scaled to unit height. Approximately half of the data were used as prototypes in the archive and the other half used for testing. Figure 4 : Classifier performance with respect to increasing noise Description of Experiments : Three sets of experiments are conducted with different set of features at different noise level. The first experiment uses all 31 feature shown in Fig 1, with increasing noise. The second experiment uses a set of more realistic features points indicated by the shaded points in Fig 1. The last experiment uses only shaded points in the upper body of Fig 1. Significance of Noise Parameter : In the synthesized images, the person is of unit height. Suppose we are tracking a person 300 pixels tall, an error margin of 0.06 in normalized coordinates simulates a tracker that reports tracked points within a 36 pixel radius 95% of the time. 3. Experiments and Results Contribution : No direct 3D structure estimation is needed. The most relevant work is Parameswaran and Chellapa [1]. We proposed a simpler alternative to the 2D-3D motion matching problem that also offers viewpoint invariance. 4. References [1] V. Parameswaran and R. Chellapa. View invariants for human action recognition. In CVPR Why DTW? The algorithm provides an optimal alignment between sequences thus we do not have to worry about variations in the speed of the motion. Equation 1: Dissimilarity Measure. P(.) applies the projection matrix. 3D motion capture sequence from an archive. 2D Features 2D-3D Matching Algorithm. Why 3D Motion capture archive? The representation is more complete than a 2D representation as there is no need to sample the motion from multiple views. Future Work : Currently there are no temporal constraints on computing the projection matrix from frame to frame. Temporal consistency can be enforced during the matching process to improve robustness.