Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky.

Slides:

Advertisements

Similar presentations

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Advertisements

ECG Signal processing (2)

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Detecting Faces in Images: A Survey

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Data Mining Classification: Alternative Techniques

An Introduction of Support Vector Machine

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.

Support Vector Machines

SVM—Support Vector Machines

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Supervised Learning Recap

CMPUT 466/551 Principal Source: CMU

Computer vision: models, learning and inference

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

The Viola/Jones Face Detector (2001)

Lecture 14 – Neural Networks

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

Ensemble Learning: An Introduction

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

CSSE463: Image Recognition Day 31 Due tomorrow night – Project plan Due tomorrow night – Project plan Evidence that you’ve tried something and what specifically.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Oral Defense by Sunny Tang 15 Aug 2003

Face Recognition and Retrieval in Video Basic concept of Face Recog. & retrieval And their basic methods. C.S.E. Kwon Min Hyuk.

An Introduction to Support Vector Machines Martin Law.

Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.

Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.

Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Benk Erika Kelemen Zsolt

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

Multimodal Information Analysis for Emotion Recognition

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

An Introduction to Support Vector Machines (M. Law)

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Methods for classification and image representation

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

Ensemble Methods in Machine Learning

Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱陳思安.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

SVMs in a Nutshell.

Face detection Many slides adapted from P. Viola.

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky,

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

An Introduction to Support Vector Machines

Hyperparameters, bias-variance tradeoff, validation

Multimodal Caricatural Mirror

Support Vector Machine _ 2 (SVM)

Support Vector Machines and Kernels

Presentation transcript:

Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky

Challenges: Limited network bandwidth Limited processing power on cell phones FAQ MobileASL goal: ASL communication using video cell phones over current U.S. cell phone network

Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs

Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video

Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth

Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth –Total data sent and received

Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth –Total data sent and received –Power consumption

Activity Analysis and MobileASL Use qualities unique to sign language –Signing/Not signing/Finger spelling –Information at beginning and ending of signs Decrease cost of sending video –Maximum bandwidth –Total data sent and received –Power consumption –Processing cost

One Approach: Variable Frame Rate

Variable Frame Rate Decrease frame rate during “listening” Goal: reduce cost while maintaining or increasing intelligibility –Maximum bandwidth? –Total data sent and received? –Power consumption? –Processing cost? YES NO YES

Demo

The story so far... Showed variable frame rate can reduce cost (25% savings in bit rate) Conducted user studies to determine intelligibility of variable frame rate videos –Quality of each frame held constant (data transmitted decreased with decreased frame rate) –Lowering frame rate did not affect intelligibility –Freeze frame thought unnatural

Outline 1.Introduction 2.Completed Activity Analysis Research a.Feature extraction b.Classification 3.Proposed Activity Analysis Research 4.Timeline to complete dissertation

Activity Analysis, big picture Raw Data Feature Extraction Classification Engine Classification Modification

Activity Analysis, thus far Feature Extraction,,,, Signing, Listening Classification

Features H.264 information: Type of macroblock Motion vectors

Features cont. Features: (x,y) motion vector face (x,y) motion vector left (x,y) motion vector right # of I blocks

Classification Train via labeled examples Training can be performed offline, testing must be real-time Support vector machines Hidden Markov models

Support vector machines More accurately called support vector classifier Separates training data into two classes so that they are maximally apart

Maximum margin hyperplane Small MarginLarge Margin Support vectors

What if it’s non-linear?

Implementation notes May not be separable –Use linear separation, but allow training errors –Higher cost for errors = more accurate model, may not generalize libsvm, publicly available Matlab library –Exhaustive search on training data to choose best parameters –Radial basis kernel function As originally published, no temporal information –Use “sliding window”, keep track of classification –Majority vote gives result

Implementation notes May not be separable –Use linear separation, but allow training errors –Higher cost for errors = more accurate model, may not generalize libsvm, publicly available Matlab library –Exhaustive search on training data to choose best parameters –Radial basis kernel function As originally published, no temporal information –Use “sliding window”, keep track of classification –Majority vote gives result

Implementation notes May not be separable –Use linear separation, but allow training errors –Higher cost for errors = more accurate model, may not generalize libsvm, publicly available Matlab library –Exhaustive search on training data to choose best parameters –Radial basis kernel function As originally published, no temporal information –Use “sliding window”, keep track of classification –Majority vote gives result

SVM Classification Accuracy Test videoSVM 3 frame SVM 4 frame SVM 5 frame gina187.8%88.8%87.9%88.7% gina285.2%87.4%90.3%88.3% gina390.6%91.3%91.1%91.3% gina486.6%87.1%87.6% Average87.6%88.7%89.2%89.0%

Hidden Markov models Markov model: finite state model, obeys Markov property Pr[X n = x | X n-1 = x n-1, X n-2 = x n-2, … X 1 = x 1 ] = Pr [X n = x | X n-1 = x n-1 ] Current state depends only on previous state Hidden Markov model: states are hidden, infer through observations

Different models

Two ways to solve recognition 1.Given observation sequence O and a choice of models, maximize Pr(O| ) Speech recognition: which word produced observation? 2.Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition. ? ? ?

Two ways to solve recognition 1.Given observation sequence O and a choice of models, maximize Pr(O| ) Speech recognition: which word produced observation? 2.Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition. ? ? ?

Two ways to solve recognition 1.Given observation sequence O and model, what is Pr(O| )? Speech recognition: which word produced observation? 2.Given observation sequence and model, find the most likely state sequence. Has been used for continuous sign recognition [Starner95].

Implementation notes Use htk, publicly available library written in C Model signing/not signing as “words” –Other possibility is to trace state sequence –Each is a 3 state model, no backward transitions Must include some temporal info, else degenerate (biased coin flip) Use 3, 4, and 5 frame window

Implementation notes Use htk, publicly available library written in C Model signing/not signing as “words” –Other possibility is to trace state sequence –Each is a 3 state model, no backward transitions Must include some temporal info, else degenerate (biased coin flip) Use 3, 4, and 5 frame window

HMM Classification Accuracy Test videoHMM 3 frame HMM 4 frame HMM 5 frame Best SVM gina187.3%88.4% 88.8% gina285.4%86.0%86.8%90.3% gina387.3%88.6%89.2%91.3% gina482.6%82.5%81.4%87.6% Average85.7%86.4%86.5%89.2%

Outline 1.Motivation 2.Completed Activity Analysis Research 3.Proposed Activity Analysis Research a.Recognize finger spelling b.Recognize movement epenthesis 4.Timeline to complete dissertation

Activity Analysis, thus far Feature Extraction,,,, Signing, Listening Classification

Activity Analysis, proposed Feature Extraction,,,, Signing, Listening, Finger spelling Classification Movement epenthesis

Proposed Research Recognize new activity –Finger spelling –Movement epenthesis (= sign segmentation) Questions –Why is this valuable? –Is it feasible? –How will it be solved?

Why? Finger spelling Believe that increased frame rate will increase intelligibility Will confirm optimal frame rate through user studies

Why? Movement epenthesis Choose frames so that low frame rate more intelligible Potentially first step in continuous sign language recognition engine Irritation must not outweigh savings; verify through user studies

Is it feasible? Previous (somewhat successful) work: –Direct measure device –Rules-based Change in motion trajectory, low motion [Sagawa00] Finger flexion [Liang98] Previous very successful work (98.8%) –Neural Network + direct measure device –Frame classified as left boundary, right boundary, or interior [Fang01]

Is it feasible? Previous (somewhat successful) work: –Direct measure device –Rules-based Change in motion trajectory, low motion [Sagawa00] Finger flexion [Liang98] Previous very successful work (98.8%) –Neural Network + direct measure device –Frame classified as beginning of sign, end of sign, or interior [Fang01]

How? Improved feature extraction –Use the part of sign to inform extraction –See what works from the sign recognition literature Improved classification

Parts of sign Handshape –Most work in sign language recognition focused here –Includes expensive techniques (time, power) Movement –We only use this right now! –Often implicitly recognized in machine learning Location Palm orientation Nonmanual signals (facial expression)

Parts of sign Handshape –Most work in sign language recognition focused here –Includes expensive techniques (time, power) Movement –We only use this right now! –Often implicitly recognized in machine learning Location Palm orientation Nonmanual signals (facial expression)

Parts of sign Handshape –Most work in sign language recognition focused here –Includes expensive techniques (time, power) Movement –We only use this right now! –Often implicitly recognized in machine learning Location Palm orientation Nonmanual signals (facial expression)

Add center of gravity to features

Parts of sign recognized by center of gravity Handshape Movement Location Palm orientation Nonmanual signals (facial expression)

Accurate COG Bayesian filters –Very similar to hidden Markov models –What state are we in, given the (noisy) observations? –Find posterior pdf of state –Kalman filter, particle filter Viola and Jones [01] object detection

Bayesian filters Update Predict Kalman: assume linear system, minimize MSE; measure Particle: sum of weighted samples; measure, update weights Kalman: add in noise, guess state Particle: add in noise, guess particle location

How? Improved feature extraction Improved machine learning –3 class SVM for finger spelling –State sequence HMM –AdaBoost [Freund97]

AdaBoost (adaptive boosting)

AdaBoost Algorithm In each round t = 1 to T: –Train a “weak learner” on weighted data –h t : features  {signing, listening}, error is sum of weights of misclassfied examples –  t = 1/2 ln((1 - error)/error) –Reweight based on error, normalize weights Answer is sign(∑ t  t h t )

Outline 1.Motivation 2.Completed Research 3.Proposed Research 4.Timeline to complete dissertation

Timeline October March 2008: Recognize signing/listening/finger spelling Deadline: Automatic Face and Gesture Recognition, March 28, Bayesian filters for better features. 2.Viola and Jones’s object detection. 3.Improve hidden Markov model. 4.Evaluate three class support vector machine. 5.Implement AdaBoost, cascade. 6.Experiment with combining these techniques.

Timeline, cont. April May 2008: Run user study to evaluate optimal frame rate for finger spelling. Deadline: ASSETS 2008, May 25, 2008 June December 2008: Apply techniques to the problem of sign segmentation. 1. Evaluate feature set and improve. 2. Conduct a user study to evaluate intelligibility of dropping frames during movement epenthesis. 3. Improve machine learning techniques; implement combination via decision trees. Early 2009: Complete dissertation and defend.

Questions?