Download presentation
Presentation is loading. Please wait.
Published byAnna Smith Modified over 9 years ago
1
Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky
2
Challenges: Limited network bandwidth Limited processing power on cell phones FAQ MobileASL goal: ASL communication using video cell phones over current U.S. cell phone network
3
Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs
4
Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video
5
Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth
6
Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth –Total data sent and received
7
Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth –Total data sent and received –Power consumption
8
Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth –Total data sent and received –Power consumption –Processing cost
9
One Approach: Variable Frame Rate
10
Variable Frame Rate Decrease frame rate during “listening” Goal: reduce cost while maintaining or increasing intelligibility –Maximum bandwidth? –Total data sent and received? –Power consumption? –Processing cost? YES NO YES
11
Demo
12
The story so far... Showed variable frame rate can reduce cost (25% savings in bit rate) Conducted user studies to determine intelligibility of variable frame rate videos –Quality of each frame held constant (data transmitted decreased with decreased frame rate) –Lowering frame rate did not affect intelligibility –Freeze frame thought unnatural
13
Outline 1.Introduction 2.Completed Activity Analysis Research a.Feature extraction b.Classification 3.Proposed Activity Analysis Research 4.Timeline to complete dissertation
14
Activity Analysis, big picture Raw Data Feature Extraction Classification Engine Classification Modification
15
Activity Analysis, thus far Feature Extraction,,,, Signing, Listening Classification
16
Features H.264 information: Type of macroblock Motion vectors
17
Features cont. Features: (x,y) motion vector face (x,y) motion vector left (x,y) motion vector right # of I blocks
18
Classification Train via labeled examples Training can be performed offline, testing must be real-time Support vector machines Hidden Markov models
19
Support vector machines More accurately called support vector classifier Separates training data into two classes so that they are maximally apart
20
Maximum margin hyperplane Small MarginLarge Margin Support vectors
21
What if it’s non-linear?
22
Implementation notes May not be separable –Use linear separation, but allow training errors –Higher cost for errors = more accurate model, may not generalize libsvm, publicly available Matlab library –Exhaustive search on training data to choose best parameters –Radial basis kernel function As originally published, no temporal information –Use “sliding window”, keep track of classification –Majority vote gives result
23
Implementation notes May not be separable –Use linear separation, but allow training errors –Higher cost for errors = more accurate model, may not generalize libsvm, publicly available Matlab library –Exhaustive search on training data to choose best parameters –Radial basis kernel function As originally published, no temporal information –Use “sliding window”, keep track of classification –Majority vote gives result
24
Implementation notes May not be separable –Use linear separation, but allow training errors –Higher cost for errors = more accurate model, may not generalize libsvm, publicly available Matlab library –Exhaustive search on training data to choose best parameters –Radial basis kernel function As originally published, no temporal information –Use “sliding window”, keep track of classification –Majority vote gives result
25
SVM Classification Accuracy Test videoSVM 3 frame SVM 4 frame SVM 5 frame gina187.8%88.8%87.9%88.7% gina285.2%87.4%90.3%88.3% gina390.6%91.3%91.1%91.3% gina486.6%87.1%87.6% Average87.6%88.7%89.2%89.0%
26
Hidden Markov models Markov model: finite state model, obeys Markov property Pr[X n = x | X n-1 = x n-1, X n-2 = x n-2, … X 1 = x 1 ] = Pr [X n = x | X n-1 = x n-1 ] Current state depends only on previous state Hidden Markov model: states are hidden, infer through observations
27
0.2 0.4 0.2 0.1 0.7 0.50.3 0.4 0.3 0.1 0.2 0.6 0.4 0.5 0.1 0.2
28
Different models 0.3 0.4 0.8 0.1 0.2 0.5 0.1 0.8 0.5 0.4 0.5 0.1 0.6 0.2 0.4 0.2 0.1 0.7 0.50.3 0.4 0.3 0.1 0.2 0.6 0.4 0.5 0.1 0.2
29
Two ways to solve recognition 1.Given observation sequence O and a choice of models, maximize Pr(O| ) Speech recognition: which word produced observation? 2.Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition. ? ? ?
30
Two ways to solve recognition 1.Given observation sequence O and a choice of models, maximize Pr(O| ) Speech recognition: which word produced observation? 2.Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition. ? ? ?
31
Two ways to solve recognition 1.Given observation sequence O and model, what is Pr(O| )? Speech recognition: which word produced observation? 2.Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition [Starner95].
32
Implementation notes Use htk, publicly available library written in C Model signing/not signing as “words” –Other possibility is to trace state sequence –Each is a 3 state model, no backward transitions Must include some temporal info, else degenerate (biased coin flip) Use 3, 4, and 5 frame window
33
Implementation notes Use htk, publicly available library written in C Model signing/not signing as “words” –Other possibility is to trace state sequence –Each is a 3 state model, no backward transitions Must include some temporal info, else degenerate (biased coin flip) Use 3, 4, and 5 frame window
34
HMM Classification Accuracy Test videoHMM 3 frame HMM 4 frame HMM 5 frame Best SVM gina187.3%88.4% 88.8% gina285.4%86.0%86.8%90.3% gina387.3%88.6%89.2%91.3% gina482.6%82.5%81.4%87.6% Average85.7%86.4%86.5%89.2%
35
Outline 1.Motivation 2.Completed Activity Analysis Research 3.Proposed Activity Analysis Research a.Recognize finger spelling b.Recognize movement epenthesis 4.Timeline to complete dissertation
36
Activity Analysis, thus far Feature Extraction,,,, Signing, Listening Classification
37
Activity Analysis, proposed Feature Extraction,,,, Signing, Listening, Finger spelling Classification Movement epenthesis
38
Proposed Research Recognize new activity –Finger spelling –Movement epenthesis (= sign segmentation) Questions –Why is this valuable? –Is it feasible? –How will it be solved?
39
Why? Finger spelling Believe that increased frame rate will increase intelligibility Will confirm optimal frame rate through user studies
40
Why? Movement epenthesis Choose frames so that low frame rate more intelligible Potentially first step in continuous sign language recognition engine Irritation must not outweigh savings; verify through user studies
41
Is it feasible? Previous (somewhat successful) work: –Direct measure device –Rules-based Change in motion trajectory, low motion [Sagawa00] Finger flexion [Liang98] Previous very successful work (98.8%) –Neural Network + direct measure device –Frame classified as left boundary, right boundary, or interior [Fang01]
42
Is it feasible? Previous (somewhat successful) work: –Direct measure device –Rules-based Change in motion trajectory, low motion [Sagawa00] Finger flexion [Liang98] Previous very successful work (98.8%) –Neural Network + direct measure device –Frame classified as beginning of sign, end of sign, or interior [Fang01]
43
How? Improved feature extraction –Use the part of sign to inform extraction –See what works from the sign recognition literature Improved classification
44
Parts of sign Handshape –Most work in sign language recognition focused here –Includes expensive techniques (time, power) Movement –We only use this right now! –Often implicitly recognized in machine learning Location Palm orientation Nonmanual signals (facial expression)
45
Parts of sign Handshape –Most work in sign language recognition focused here –Includes expensive techniques (time, power) Movement –We only use this right now! –Often implicitly recognized in machine learning Location Palm orientation Nonmanual signals (facial expression)
46
Parts of sign Handshape –Most work in sign language recognition focused here –Includes expensive techniques (time, power) Movement –We only use this right now! –Often implicitly recognized in machine learning Location Palm orientation Nonmanual signals (facial expression)
47
Add center of gravity to features
48
Parts of sign recognized by center of gravity Handshape Movement Location Palm orientation Nonmanual signals (facial expression)
49
Accurate COG Bayesian filters –Very similar to hidden Markov models –What state are we in, given the (noisy) observations? –Find posterior pdf of state –Kalman filter, particle filter Viola and Jones [01] object detection
50
Bayesian filters Update Predict Kalman: assume linear system, minimize MSE; measure Particle: sum of weighted samples; measure, update weights Kalman: add in noise, guess state Particle: add in noise, guess particle location
51
How? Improved feature extraction Improved machine learning –3 class SVM for finger spelling –State sequence HMM –AdaBoost [Freund97]
52
AdaBoost (adaptive boosting)
53
AdaBoost Algorithm In each round t = 1 to T: –Train a “weak learner” on weighted data –h t : features {signing, listening}, error is sum of weights of misclassfied examples – t = 1/2 ln((1 - error)/error) –Reweight based on error, normalize weights Answer is sign(∑ t t h t )
54
Outline 1.Motivation 2.Completed Research 3.Proposed Research 4.Timeline to complete dissertation
55
Timeline October 2007 - March 2008: Recognize signing/listening/finger spelling Deadline: Automatic Face and Gesture Recognition, March 28, 2008 1.Bayesian filters for better features. 2.Viola and Jones’s object detection. 3.Improve hidden Markov model. 4.Evaluate three class support vector machine. 5.Implement AdaBoost, cascade. 6.Experiment with combining these techniques.
56
Timeline, cont. April 2008 - May 2008: Run user study to evaluate optimal frame rate for finger spelling. Deadline: ASSETS 2008, May 25, 2008 June 2008 - December 2008: Apply techniques to the problem of sign segmentation. 1. Evaluate feature set and improve. 2. Conduct a user study to evaluate intelligibility of dropping frames during movement epenthesis. 3. Improve machine learning techniques; implement combination via decision trees. Early 2009: Complete dissertation and defend.
57
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.