1 Human Gesture Recognition by Mohamed Bécha Kaâniche 11/02/2009.

Slides:

Advertisements

Similar presentations

3.6 Support Vector Machines

Advertisements

Chapter 1 The Study of Body Function Image PowerPoint

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Environmental Remote Sensing GEOG 2021

1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.

Monitoring Fish Passage with an Automated Imaging System Steve R. Brink, Senior Fisheries Biologist Northwest Hydro Annual Meeting 2014, Seattle.

Real-Time Projector Tracking on Complex Geometry Using Ordinary Imagery Tyler Johnson and Henry Fuchs University of North Carolina – Chapel Hill ProCams.

Electric Bus Management System

Bayesian Decision Theory Case Studies

1 Photometric Stereo Reconstruction Dr. Maria E. Angelopoulou.

Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.

Matthias Wimmer, Bernd Radig, Michael Beetz Chair for Image Understanding Computer Science TU München, Germany A Person and Context.

TU/e, BMIA & PMS, X-Ray Predevelopment, Erik Franken, Context-Enhanced Detection of Electrophysiology Catheters in Noisy Fluoroscopy Images.

Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.

Presenter: Duan Tran (Part of slides are from Pedro’s)

© 2012 National Heart Foundation of Australia. Slide 2.

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

Probabilistic Tracking and Recognition of Non-rigid Hand Motion

25 seconds left…...

Princess Nora University Artificial Intelligence Artificial Neural Network (ANN) 1.

H to shape fully developed personality to shape fully developed personality for successful application in life for successful.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

PSSA Preparation.

People Counting and Human Detection in a Challenging Situation Ya-Li Hou and Grantham K. H. Pang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART.

Where Are You From? Confusing Location Distinction Using Virtual Multipath Camouflage Song Fang, Yao Liu Wenbo Shen, Haojin Zhu 1.

Probabilistic Reasoning over Time

DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

人機介面 Gesture Recognition

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Introduction To Tracking

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Vision Based Control Motion Matt Baker Kevin VanDyke.

Robust Object Tracking via Sparsity-based Collaborative Model

Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.

 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.

AAM based Face Tracking with Temporal Matching and Face Segmentation Dalong Du.

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Recent Developments in Human Motion Analysis

A Study of Approaches for Object Recognition

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.

Augmented Reality: Object Tracking and Active Appearance Model

Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition Oytun Akman.

Face Recognition and Retrieval in Video Basic concept of Face Recog. & retrieval And their basic methods. C.S.E. Kwon Min Hyuk.

Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.

Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

1 Mean shift and feature selection ECE 738 course project Zhaozheng Yin Spring 2005 Note: Figures and ideas are copyrighted by original authors.

1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.

Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.

A General Framework for Tracking Multiple People from a Moving Camera

A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.

Person detection, tracking and human body analysis in multi-camera scenarios Montse Pardàs (UPC) ACV, Bilkent University, MTA-SZTAKI, Technion-ML, University.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.

Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

IEEE International Conference on Multimedia and Expo.

Gesture Recognition 12/3/2009.

Object detection as supervised classification

Brief Review of Recognition + Context

Presentation transcript:

1 Human Gesture Recognition by Mohamed Bécha Kaâniche 11/02/2009

2 Outline  Introduction  State of the art  Proposed Method  Human Gesture Descriptor  Human Gesture Learning and Classification  Preliminary results  Conclusion

3 Introduction Human Gesture Recognition ?? Human Gesture ?? Gesture Recognition ??

4 Introduction (2) What is a Gesture ?  Any meaningful movement of the human body !  To convey information or to interact with environment !  [Pei 1984] identifies non-verbal signals !  [Birdwhistell 1963] estimates Face expressions !  [Krout 1935] identifies 5000 Hand gestures !  Gesture signification differs widely from one culture to another !  Synchronous with speech, gaze, expressions !  According to [Hall 1973] 65% of communication is non-verbal !  Non-verbal: gesture, appearance, voice, chronemics, haptics !

5 Introduction (3) Gesture Dynamics ConsciousEmblemsIllustrators Affect displays RegulatorsUnconsciousAdaptorsStatics

6 Introduction (4) What kind of gesture recognition ?  Identify, eventually interpret automatically human gestures !  Use a set of sensors and electronic processing units !  According to the type of sensors we distinguish:  Pen-based gesture recognition  Multi-touch surface based gesture recognition  Tracker-based gesture recognition  Instrumented gloves, Wii remote control,…  Body suits.  Vision-based gesture recognition

7 Introduction (5) Vision-based gesture recognition ?  Advantages: Passive, non-obtrusive and « low-cost ».  Challenges:  Efficiency: real-time constraints.  Robustness: background/foreground changes.  Occlusion: Change of the point of view, self-occlusion,…  Categories:  Head/Face gesture recognition  Hand/arm gesture recognition  Body gesture recognition

8 State of the art Vision-based Gesture Recognition System Sensor Processing Feature Extraction Gesture Classification Gesture Database Recognized Gesture

9 State of the art (2) Issues:  Number of cameras: mono/multi cameras, stereo/multi-view ?  Speed and latency: fast enough with low enough latency interaction.  Structured environment: background, lighting, motion speed.  User requirements: clothes, body markers, glasses, beard,…  Primary features: edges, regions, silhouettes, moments, histograms.  2D/3D representation.  Time representation: Time aspect representation.

10 State of the art (3) Gesture Models 3D models 3D Texture Volumetric models 3D Geometric models 3D Skeleton models Appearance models Color based models Shape geometry based models 2D deformabl e template models Motion based models

11 State of the art (4) Techniques of gesture recognition For postures Linear classific. (e.g. k- means) Non- Linear classific. (e.g. N.N.) For gestures Hidden- Markov- chains Dynamic Time Wraping Time Delay Neural Network Finite State Machines, Dynamic Bayesian Network, PNF

12 State of the art (5) About Motion models based approaches:  Automata based recognition:  Very complex and difficult process !  Unreliable with monocular environment !  Computationally expensive !  Integrate the time aspect in the gesture model. (Unique model!)  Techniques dedicated to posture recognition can be used.  Early Methods: Optical flow, Motion History (MHI, 3D-MHM)  [Calderara 2008] proposes a global descriptor: action signature.  [Liu 2008] proposes local descriptors: cuboids.

13 Proposed Method Hypotheses  Monocular environment.  Dedicated to isolated individuals. (For implementation reason)  No restrictions on the environment and the clothes of targets.  Distinguishable body parts: Not handle target far away from the camera.  Assume sensor processing algorithms provided !  Availability of a segmentation algorithm.  Availability of a people classifier.  Availability of a people tracker. (For implementation reason)

14 Proposed Method (2) Type of gestures and actions to recognize

15 Proposed Method (3) Method Overview Sensor Processing Local Motion Descriptors Extraction Gesture Classification Gesture Codebook Recognized Gesture

16 Proposed Method (4) Local Motion Descriptors Extraction Corners Extraction 2D – HoG Descriptors Computation 2D-HoG Descriptors Tracker From sensor Processing Local Motion Descriptors

17 Proposed Method (5) Gesture Codebook Learning Training video sequences Sensor Processing Local Motion Descriptors Extraction Sequence annotations Gesture Codebook Clustering Code-words

18 Human Gesture Descriptor Steps for Human Gesture Descriptor Generation:  Corners Detection  Find interest points where the motion can be easily tracked.  Ensure uniform distribution of feature through the body.  2D HoG Descriptors Extraction  For each interest point compute a 2D HoG descriptor.  Local Motion Descriptors Computation  Tracking 2D HoG Descriptors to build local motion descriptors  Gesture descriptor Computation: matching the local motion descriptors with the learned code-words.

19 Human Gesture Descriptor (2) Corners detection:  Shi-Tomasi features: Given an image and its gradients and respectively through the x axis and the y axis. The Harris matrix for an image pixel in a window of size (u,v) is: [Shi 1994] prove that is a better measure of corner strength than the measure proposed by Harris Detector. Where and are the eigen values of the Harris matrix.

20 Human Gesture Descriptor (3) Corners detection (cont’d): FAST features (Features from Accelerated Segment Test) :

21 Human Gesture Descriptor (4) 2D HoG Descriptor: Descriptor bloc (3x3 cells): Cell Corner Point 5x5 or 7x7 pixels

22 Human Gesture Descriptor (5) 2D HoG Descriptor (cont’d): For each pixel in the descriptor bloc we compute: and For each cell in the descriptor bloc we compute: where K is the number of orientation bins and :

23 Human Gesture Descriptor (6) 2D HoG Descriptor (cont’d): The 2D HoG Descriptor associated to the descriptor bloc is: where: and is a normalisation coefficient defined as: The dimention of is 9 x K and its components values are in [0..1]

24 Human Gesture Descriptor (7) Local Motion Descriptor: Track 2D HoG Descriptor with the least square method using kalman filter: 1. Initialization (t=0, first frame): compute new 2D HoG descriptors For each of them associate its position « x 0 » and initialize the error tolerance « P 0 » (2x2 covariance matrix). 2. Prediction (t>0): For each 2D HoG descriptor in the last frame, use the Kalman filter to predict the relative position of the descriptor « » which is considered as search center.

25 Human Gesture Descriptor (8) Local Motion Descriptor (cont’d): 3. Correction (t>0): Locate the 2D HoG descriptor in the current frame (which is in the neighborhood of the predicted position ) by using its real position (measurement by minimizing the squared error) to carry out the position correction using the Kalman filter : finding the final estimate. Steps 2 and 3 are carried out while the tracking runs. Consider a 2D HoG descriptor tracked successfully during a temporal window, the Local Motion Descriptor is the concatenation of all the values of the descriptors in this temporal window.

26 Human Gesture Descriptor (9) Local Motion Descriptor (cont’d): Time Update (Predict) (1)Project the position ahead (2)Project the error covariance Measurement Update(Correct) (1)Compute the Kalman gain (2)Update estimate with measurement (3)Update the error covariance

27 Human Gesture Learning and Classification Gesture Learning Training video sequences Sensor Processing Local Motion Descriptors Extraction Sequence annotations Gesture Codebook K-means Clustering Code-words

28 Human Gesture Learning and Classification (2) Gesture Learning (cont’d): k-means: classify the generated local descriptors (for all gestures) into « k » clusters. Let « n » the number of generated local descriptors, and « m » the number of gestures in the training set: where « T » is a parameter (strictly positive integer) which can be fixed empirically or learned with an Expectation Maximization (EM) algorithm. Minimize total intra-cluster variance (the squared error function):

29 Human Gesture Learning and Classification (3) Gesture Classification: The k-nearest neighboors algorithm: Given a Gesture codebook database {(code-word,gesture)}, and an input {code-word}: For each code-word in the input, select the k-nearest input code-words in the database usingeuclidean distance. For each correspondant output gesture Vote for the gesture. Select the gesture that win the vote.

30 Preliminary Results Cuurent Progress: Evaluate Local Motion Descriptors Generation Training gestures from KTH and IXMAS databases. Walking Boxing

31 Conclusion Contributions: Local Motion Descriptors for Gesture representation. Tracking local texture-based descriptor. Future Work: Add Likelihood information by using Maximization of Mutual Information algorithm for the Gesture Learning Process. Evaluate SVM classifier and compare its results to the k-nearest neighboors algorithm.

32 Thank you for your attention !