Fingerspelling Alphabet Recognition Using A Two-level Hidden Markov Model Shuang Lu, Joseph Picone and Seong G. Kong Institute for Signal and Information.

Slides:



Advertisements
Similar presentations
Loris Bazzani*, Marco Cristani*†, Alessandro Perina*, Michela Farenzena*, Vittorio Murino*† *Computer Science Department, University of Verona, Italy †Istituto.
Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Histograms of Oriented Gradients for Human Detection
1/12 Vision based rock, paper, scissors game Patrik Malm Standa Mikeš József Németh István Vincze.
Abstract Advanced gaming interfaces have generated renewed interest in hand gesture recognition as an ideal interface for human computer interaction. In.
Robust Part-Based Hand Gesture Recognition Using Kinect Sensor
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Xin Zhang, Zhichao Ye, Lianwen Jin, Ziyong Feng, and Shaojie Xu
LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Object Inter-Camera Tracking with non- overlapping views: A new dynamic approach Trevor Montcalm Bubaker Boufama.
Adviser : Ming-Yuan Shieh Student ID : M Student : Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
A Colour Face Image Database for Benchmarking of Automatic Face Detection Algorithms Prag Sharma, Richard B. Reilly UCD DSP Research Group This work is.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Real-Time Decentralized Articulated Motion Analysis and Object Tracking From Videos Wei Qu, Member, IEEE, and Dan Schonfeld, Senior Member, IEEE.
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Performance Evaluation of Grouping Algorithms Vida Movahedi Elder Lab - Centre for Vision Research York University Spring 2009.
Abstract EEGs, which record electrical activity on the scalp using an array of electrodes, are routinely used in clinical settings to.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Gesture Recognition Using Laser-Based Tracking System Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa Ishikawa Namiki Laboratory UNIVERSITY OF.
Stereo Matching Information Permeability For Stereo Matching – Cevahir Cigla and A.Aydın Alatan – Signal Processing: Image Communication, 2013 Radiometric.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Fingertip Tracking Based Active Contour for General HCI Application Proceedings of the First International Conference on Advanced Data and Information.
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
1 Mean shift and feature selection ECE 738 course project Zhaozheng Yin Spring 2005 Note: Figures and ideas are copyrighted by original authors.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
Fingerspelling Alphabet Recognition Using A Two-level Hidden Markov Model Shuang Lu, Joseph Picone and Seong G. Kong Institute for Signal and Information.
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
出處: Signal Processing and Communications Applications, 2006 IEEE 作者: Asanterabi Malima, Erol Ozgur, and Miijdat Cetin 2015/10/251 指導教授:張財榮 學生:陳建宏 學號: M97G0209.
Pedestrian Detection and Localization
A New Fingertip Detection and Tracking Algorithm and Its Application on Writing-in-the-air System The th International Congress on Image and Signal.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Abstract Advanced gaming interfaces have generated renewed interest in hand gesture recognition as an ideal interface for human computer interaction.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Expectation-Maximization (EM) Case Studies
Histograms of Oriented Gradients for Human Detection(HOG)
Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Jiu XU, Axel BEAUGENDRE and Satoshi GOTO Computer Sciences and Convergence Information Technology (ICCIT), th International Conference on 1 Real-time.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱 陳思安.
Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Saleh Ud-din Ahmad Dr. Md. Shamim Akhter
Seunghui Cha1, Wookhyun Kim1
Dynamical Statistical Shape Priors for Level Set Based Tracking
New horizons in the artificial vision
A Tutorial on HOG Human Detection
Image Segmentation Techniques
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Hand Gesture Recognition Using Hidden Markov Models
Paper Reading Dalong Du April.08, 2011.
Jiangbin Zheng’s Brief Biography
Learning complex visual concepts
Presentation transcript:

Fingerspelling Alphabet Recognition Using A Two-level Hidden Markov Model Shuang Lu, Joseph Picone and Seong G. Kong Institute for Signal and Information Processing Temple University Philadelphia, Pennsylvania, USA

IPCV 2013July 22, Abstract Advanced gaming interfaces have generated renewed interest in hand gesture recognition as an ideal interface for human computer interaction. Signer-independent (SI) fingerspelling alphabet recognition is a very challenging task due to a number of factors including the large number of similar gestures, hand orientation and cluttered background. We propose a novel framework that uses a two-level hidden Markov model (HMM) that can recognize each gesture as a sequence of sub-units and performs integrated segmentation and recognition. We present results on signer-dependent (SD) and signer- independent (SI) tasks for the ASL Fingerspelling Dataset: error rates of 2.0% and 46.8% respectively.

IPCV 2013July 22, Primary mode of communication for over 500,000 people in North America alone. In a typical communication, 10% to 15% of the words are signed by fingerspelling of alphabet signs. American Sign Language (ASL) Recognition Similar to written English, the one-handed Latin alphabet in ASL consists of 26 hand gestures. The objective of our work is to classify 24 ASL alphabet signs from a static 2D image (we exclude “J” and “Z” because they are dynamic hand gestures).

IPCV 2013July 22, Similar shapes (e.g., “r” vs. “u”) Separation of hand from background Hand and background are similar in color Hand and arm are similar in color Background occurs within a hand shape Rotation, magnification, perspective, lighting, complex backgrounds, skin color, … Signer independent (SI) vs. signer dependent (SD) ASL Still A Challenging Problem

IPCV 2013July 22, Architecture: Two-Level Hidden Markov Model

IPCV 2013July 22, Architecture: Histogram of Oriented Gradient (HOG) Benefits: Illumination invariance due to the normalization of the gradient of the intensity values within a window. Emphasizes edges by the use of an intensity gradient calculation. Less sensitive to background details because the features use a distribution rather than a spatially-organized signal. Gradient intensity and orientation: In every window, separate A(x, y) (from 0 to 2π) into 9 regions; sum all G(x, y) within the same region. Normalize features inside each block:

IPCV 2013July 22, Architecture: Two-Levels of Hidden Markov Models

IPCV 2013July 22, Experiments: ASL Fingerspelling Corpus 24 static gestures (excluding letters ”J” and ”Z”) 5 subsets from 4 subjects More than 500 images per sign per subject A total of 60,000 images Similar gestures Different image sizes Face occlusion Changes in illumination Variations in signers Sign rotation

IPCV 2013July 22, Experiments: Parameter Tuning Performance as a function of the frame/window size Frame (N) Window (M) % Overlap Error (%) 52075%7.1% 53083%4.4% %5.1% %5.0% %8.0% System ParameterValue Frame Size (pixels)5 Window Size (pixels)30 No. HOG Bins9 No. Sub-gesture Segments11 No. States Per Sub-gesture Model21 No. States Long Background (LB)11 No. States Short Background (SB)1 No. Gaussian Mixtures (SG models)16 No. Gaussian Mixtures (LB/SB models)32 No. Mixtures Error Rate (%) Performance as a function of the number of mixture components An overview of the optimal system parameters Parameters were sequentially optimized and then jointly varied to test for optimality Optimal settings are a function of the amount of data and magnification of the image

IPCV 2013July 22, Experiments: SD vs. SI Recognition SystemSDSharedSI Pugeault (Color Only) N/A27.0%65.0% Pugeault (Color + Depth) N/A25.0%53.0% HMM (Color Only) 2.0%7.8%46.8% Performance is relatively constant as a function of the cross- validation set Greater variation as a function of the subject SD performance is significantly better than SI performance. “Shared” is a closed-subject test where 50% of the data is used for training and the other 50% is used for testing. HMM performance doesn’t improve dramatically with depth.

IPCV 2013July 22, Analysis: Confusion Matrix

IPCV 2013July 22, Experiments: Error Analysis Gestures with a high confusion error rate. Images with significant variations in background and hand rotation. “SB” model is not reliably detecting background. Solution: transcribed data?

IPCV 2013July 22, Summary and Future Directions A two-level HMM ‑ based ASL fingerspelling alphabet recognition system that trains gesture and background noise models automatically:  Five essential parameters were tuned by cross-validation.  Our best system configuration achieved a 2.0% error rate on an SD task, and a 46.8% error rate on an SI task. Currently developing new architectures that perform improved segmentation. Both supervised and unsupervised methods will be employed.  We expect performance to be significantly better on the SI task. All scripts, models, and data related to these experiments are available from our project web site:

IPCV 2013July 22, Brief Bibliography of Related Research [1]Lu, S., & Picone, J. (2013). Fingerspelling Gesture Recognition Using Two Level Hidden Markov Model. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition. Las Vegas, USA. (Download).Download [2]Pugeault, N. & Bowden, R. (2011). Spelling It Out: Real-time ASL Fingerspelling Recognition. Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 1114–1119). (available at pellingDataset). [3]Vieriu, R., Goras, B. & Goras, L. (2011). On HMM Static Hand Gesture Recognition. Proceedings of International Symposium on Signals, Circuits and Systems (pp. 1–4). Iasi, Romania. [4]Kaaniche, M. & Bremond, F. (2009). Tracking HOG Descriptors for Gesture Recognition. Proceedings of the Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance (pp. 140–145). Genova, Italy. [5]Wachs, J. P., Kölsch, M., Stern, H., & Edan, Y. (2011). Vision-based Hand- gesture Applications. Communications of the ACM, 54(2), 60–71.

IPCV 2013July 22, Demonstration

IPCV 2013July 22, Project Web Site

IPCV 2013July 22, Biography Shuang Lu received her BS in Electrical Engineering in 2008 from a Great University in, China in She received an MSEE from in, China in She is currently pursuing a PhD in Electrical Engineering in the Department of Electrical and Computer Engineering at Temple University, where she works as a teaching assistant responsible for entry-level digital and analog electronics laboratories. She has also worked as an intern at <list Temple Hospital and your current part-time job. Ms. Lu’s primary research interests are are are are. She has published X journal papers and Y peer-reviewed conference papers. She is a student member of the IEEE, as well as a member of HKN? and (list other affiliations and honors). Her hobbies include..