Download presentation
Presentation is loading. Please wait.
2
Hand Movement Recognition By: Tokman Niv Levenbroun Guy Instructor: Todtfeld Ari
3
Goals and objectives Project goal was to create a system that analyses colored glove movements and translate them into speech. Project goal was to create a system that analyses colored glove movements and translate them into speech. This project is based on the work done by Alex Gluzman And Lior Neuman At spring 2004, who created a system that detects the position of the colored glove. This project is based on the work done by Alex Gluzman And Lior Neuman At spring 2004, who created a system that detects the position of the colored glove.
4
System overview The system is composed of the following units: The system is composed of the following units: Color video camera Color video camera Colored glove Colored glove Double talk Double talk PC+Rio frame grabber PC+Rio frame grabber
5
System overview – basic operation The camera shoots the glove. The camera shoots the glove. The data if grabbed by the rio FrameGrabber and placed in memory. The data if grabbed by the rio FrameGrabber and placed in memory. Rio routines inform the program that there is a new frame waiting, by calling OnDraw(). Rio routines inform the program that there is a new frame waiting, by calling OnDraw(). The image is processed. The image is processed. The result is transferred to the DoubleTalk device via the computers COM port. The result is transferred to the DoubleTalk device via the computers COM port.
6
System’s user interface
7
Movement recognition flow Movement is derived from a series of still frames, for each we have colors centroids locations and the recognized position Movement is derived from a series of still frames, for each we have colors centroids locations and the recognized position For each frame, we compare the color’s location to the previous frame and extract the direction each color has moved. These directions are the 8 prime direction, no movement or missing color. For each frame, we compare the color’s location to the previous frame and extract the direction each color has moved. These directions are the 8 prime direction, no movement or missing color. The list of direction, with the positions compose a movement The list of direction, with the positions compose a movement
8
Yes No Extracting color locations (x, y, Color) Yes Update: Sum of X values Sum of Y values Number of pixels with that color. For every scanned pixel Mark pixel as black For each color Number of pixels < threshold ? Color match one of the preset colors (in HSV values)? Mark color as missing
9
Example Example Result : LEFT Result : LEFT Calculating directions Each color CM is compared to the CM in the previous frame. Each color CM is compared to the CM in the previous frame. Δx and Δy are calculated. Δx and Δy are calculated. Conclude the direction. Conclude the direction. Δx = -10
10
Recognition flow Yes New frame is ready in Rio memory buffer Process frame for color’s centroids location Identify hand position Calculate color direction in reference to previous frame 30 frames without movement? Identify movement Start Below is a regular main loop iteration: Below is a regular main loop iteration:
11
Low frame rate Low frame rate Scanning each frame is time/CPU consuming. Scanning each frame is time/CPU consuming. Low frame rate cause movement information loss Low frame rate cause movement information loss Noise in frame sampling Noise in frame sampling Camera shakes, lighting varies Camera shakes, lighting varies Non linear hand movement Non linear hand movement Analyzing movement (directions list) Analyzing movement (directions list) Frame rate influence length of list Frame rate influence length of list List comparing is time consuming (affecting frame rate) List comparing is time consuming (affecting frame rate) Challenges of flow Hand movement desired movement
12
To reduce scan time, only every 4 th pixel is scanned, until a color is found. Then scan is done on every pixel until color disappears. To reduce scan time, only every 4 th pixel is scanned, until a color is found. Then scan is done on every pixel until color disappears. This allows for faster frame processing with still accurate CM calculations. This allows for faster frame processing with still accurate CM calculations. HSV transformation is done during the scan. HSV transformation is done during the scan. Pixels are transformed to HSV minimize lighting effects. Pixels are transformed to HSV minimize lighting effects. Frame processing solution http://en.wikipedia.org/wiki/HSV_color_space
13
Scanning a black line takes ¼ of the time since we sample only every 4 th pixel (most lines are black). Scanning a black line takes ¼ of the time since we sample only every 4 th pixel (most lines are black). When we come across a colored pixel we scan every pixel, so we won’t miss any of them, to improve CM accuracy. When we come across a colored pixel we scan every pixel, so we won’t miss any of them, to improve CM accuracy. Sometimes we might miss the edges of the color, since we don’t start to scan at the first appearance. Sometimes we might miss the edges of the color, since we don’t start to scan at the first appearance. Frame processing solution
14
Noise filtering First task is to distinguish between a real glove movement and noise (camera shakes, light changes). First task is to distinguish between a real glove movement and noise (camera shakes, light changes). This is accomplished by defining a distance threshold (in pixels). This is accomplished by defining a distance threshold (in pixels). This threshold can be set automatically at run time: 120 frames are collected while glove is still, and worst deviation between any two frames is set as threshold. This threshold can be set automatically at run time: 120 frames are collected while glove is still, and worst deviation between any two frames is set as threshold. Result : LEFT Result : LEFT Δy = 3 Δx = -10 TH = 4
15
Whenever a new CM is calculated, we try to guess it’s position by time derivation in x and y axis. Whenever a new CM is calculated, we try to guess it’s position by time derivation in x and y axis. We than calculate the ‘real’ CM by: We than calculate the ‘real’ CM by: Non-linear movement solution http://www.cs.unc.edu/~welch/kalman/
16
We collect the movements history until a series of 30 no-movement frames are accumulated. We collect the movements history until a series of 30 no-movement frames are accumulated. Next, same adjacent movements are collapsed into one, so in the end the history contains only changes in direction. (The Mozilla gesture recognizer uses this model) Next, same adjacent movements are collapsed into one, so in the end the history contains only changes in direction. (The Mozilla gesture recognizer uses this model) There is also a threshold for movements in the same basic direction, below which they are discarded as noise. There is also a threshold for movements in the same basic direction, below which they are discarded as noise. In order to speed up the comparison process, we convert the collapsed list into a decimal number by enumerating the direction. In order to speed up the comparison process, we convert the collapsed list into a decimal number by enumerating the direction. Movement recognition solution http://blog.monstuff.com/archives/000012.html
17
Down = 1 Up = 2 Left = 3321 left Up-leftup down leftupdown First the list is collapsed First the list is collapsed The new list is translated to a decimal number according to the enumeration The new list is translated to a decimal number according to the enumeration The number is than compared against the movements in the Database. The number is than compared against the movements in the Database. Movement recognition - example
18
Summery We have created a system that tracks and identifies glove movements, and translate it into speech according to user defined DB. We have created a system that tracks and identifies glove movements, and translate it into speech according to user defined DB. Main features: Main features: An intuitive graphical user interface that allows easy configuration and operation. An intuitive graphical user interface that allows easy configuration and operation. Data base support to store user’s movements and positions, with the ability to modify, add and delete entries through the GUI. Data base support to store user’s movements and positions, with the ability to modify, add and delete entries through the GUI. Calibration of environment: camera’s height, noise reduction etc. Calibration of environment: camera’s height, noise reduction etc.
19
Additional uses – UI demo This recognition engine can be used for other uses. This recognition engine can be used for other uses. For example: turn the glove into an input device that can replace the mouse. For example: turn the glove into an input device that can replace the mouse. possible use: Catheterization Room since the room must be sterile regular IO devices can’t be used. possible use: Catheterization Room since the room must be sterile regular IO devices can’t be used. http://www.clarontech.com/
20
Example of use: UI
21
Added values – what we learned Movement tracking in noisy environment. Movement tracking in noisy environment. Challenges of real time image processing Challenges of real time image processing MFC programming skills MFC programming skills Windows IPC (Inter Process Communication) Windows IPC (Inter Process Communication)
22
Ideas for future development Use USB camera, and MS speech SDK. Use USB camera, and MS speech SDK. Saving history of recognized movements for more complex sentences. Saving history of recognized movements for more complex sentences. Dynamically setting different movement threshold for each color. Dynamically setting different movement threshold for each color. During frame scanning, when encounter a color pixel jump back and rescan. During frame scanning, when encounter a color pixel jump back and rescan. Using the number of frames in a continues direction Using the number of frames in a continues direction Use neural network for better recognition. Use neural network for better recognition. http://www.codeproject.com/cpp/gestureapp.asp
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.