Morphological Segmentation of Natural Gesture

Morphological Segmentation of Natural Gesture
Stroke Retract Prepare Hold Jacob Eisenstein MAS 622 Final Project

Natural Gesture Gesture supplements verbal communication
Turn boundaries Reference resolution Visual imagery What are the lowest-level gesture units? McNeill: “Movement phases” Stroke Prepare Hold Retract

Videos of people explaining things to each other
Prepare Stroke Hold Time

Outline Hand Tracking “Guided” clustering Kalman Filter Gesture
Recognition Durational HMMs Recurrent Neural Networks

Hand Tracking Seems easy Occlusion, shadows
Hands are not in every frame 85% accuracy with color info alone How to do better?

N P Better Hand Tracking Other features But how to use these features?
Position Edges But how to use these features? Supervised Training P = set of positive examples N = set of negative examples P N

N P “Guided” Training P’ N’ Labeling is very expensive
Approximate P and N Initialize clusters at centers of P’ and N’ K-means cluster using all points N P P’ N’

Hand Tracking Results Error Rate: (FP + FN + 2*WrongPos) / ALL

Kalman Filtering X(t) = X(t-1) + V(t-1) V(t) = V(t-1) + W(t)
Y(t) = X(t) + R(t) State Observation Initialization Cov(W) = [.1 0 0 .1] Cov(R) = [1 0 0 1] Parameters re-estimated using EM

Kalman Filter Results Reduces position accuracy Smoothes velocity
Improves overall performance by ~5%

Movement Phase Recognition
Two sources of information Observable features Velocity, position Temporal / sequential Ideal for HMM?

HMM Setup We have data with states labeled
Learn state transitions and outputs directly from data No need for Baum-Welch estimation Find best path using Viterbi Can use any probabilistic classifier for the output probabilities

Initial Results Accuracy = percent classified correctly
Including “no gesture” 5-class problem 1-component mixture: 34.6% 3-component mixture: 33.3% 7-component mixture: 32.6% Not very good!

Durational HMMs HMMs assume an exponential decay model for state duration What about other models of state duration? Rabiner explains parameter estimation for durational HMMs, but not Viterbi

Viterbi for Gaussian Durational HMMs
Pi(d) Pj(d) Leaving a state obeys an probability density function P(d==t) = N(t,u,s) Each self-transition obeys a cumulative probability function P(d>t) = 1-C(t,u,s) Normalize for the cost you’ve already paid P(d=t|d>t-1) = N(t,u,s)/(1-C(t-1,u,s)) P(t>t|d>t-1) = (1-C(t,u,s))/(1-C(t-1,u,s))

Results for Durational Viterbi
Standard 1 component: 34.6 3 components: 33.3 7 components: 31.6 Durational 1 component: 35.5 3 components: 36.7 7 components: 38.0 Best durational is 3.4% better than best baseline

Neural Networks Feedforward network (13 x 50 x 5): 44.5%
Ignoring sequence and temporal information! Maybe recurrent NNs can do even better?

Future Work Hand Tracking Kalman Filtering Gesture Phase Recognition
Cluster to mixtures of Gaussians instead of single Gaussians Kalman Filtering Noise is not Gaussian Particle filter? Gesture Phase Recognition Recurrent Neural Networks Other discriminantive methods

Morphological Segmentation of Natural Gesture

Similar presentations

Presentation on theme: "Morphological Segmentation of Natural Gesture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Morphological Segmentation of Natural Gesture

Similar presentations

Presentation on theme: "Morphological Segmentation of Natural Gesture"— Presentation transcript:

Similar presentations

About project

Feedback