Download presentation
Presentation is loading. Please wait.
Published byAlois Macháček Modified over 5 years ago
1
Sign Language Recognition With Unsupervised Feature Learning
Hi, my name is Justin and I did my project on applying Unsupervised Feature Learning and Segmentation to Sign Language Recognition. RGB-D data was collected using a Microsoft Kinect for a total of 10 classes or letters. I collected 1200 images for each letter in slightly different positions and finger poses. This gave me a total dataset consisting of images to train the autoencoder, softmax layer. Each image was passed through a skin segmentation model which used color segmentation techniques to extract all “skin-colored” parts of an image. This usually extracts just the face and hands since smaller skin-colored objects in the background are filtered out in the process. Afterwards, the depth information from the Kinect is used to threshold the skin mask and remove the face from the segmentation. This is done by using a closest-object-only criteria which takes advantage of the fact that hands are basically always in front of a person’s face when signing and drops skin pixels which are too far away. Using these segmented hands, I then resize and crop them to fit into the 32 times 32 sized input layer of the autoencoder. After the size 100 hidden layer generates an activation map response, it is passed through a softmax model and then a prediction is made for the most likely letter that the activation map could correspond to. In the example shown above, the prediction that gives the highest probability given the input, is the letter “b”. Sample Images in collected dataset (RGB and Depth) CS231A Computer Vision – Justin Chen – (in collaboration with CS229) 1
2
Sign Language Recognition With Unsupervised Feature Learning
Autoencoder Features Learning Curve of System Here are some of the features extracted from the autoencoder layer using the training data from my dataset. To the right, are some learning curve results that I got by testing on different dataset sizes. As you can see, the classification error becomes progressively smaller as the dataset size increases. In order to make this into a more useful project, I decided to turn this into a live demo which could perform on real-time video input from the Kinect. A sample run screenshot is shown above. This system I created, although only currently consisting of 10 letters, achieves a classification accuracy of approximately 98% on a randomly selected test set of data. Live Demo Results CS231A Computer Vision – Justin Chen – (in collaboration with CS229) 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.