Download presentation
1
Interaction Technology 2016
Faculty of Science Information and Computing Sciences Interaction Technology Interaction Technology 2016 Shape and Gesture Recognition Coert van Gemeren
2
Today Shape recognition Poses and gestures Pose recognition Optical flow and tracking Gesture recognition Assignments
3
Shape recognition
4
Shape recognition We know how to isolate shapes in an image
using cv::inRange on the pixels in the range We know how to find the contours and the convex hull on the different shapes using cv::findContours and cv::convexHull
5
Shape recognition Using a simple minmax-function we can find the spatial extremes of the hull: std::vector<std::vector<cv::Point>> contours; cv::findContours(…); std::vector<std::vector<cv::Point>> hull(contours.size()); for (int i = 0; i < (int) contours.size(); ++i) { cv::convexHull(cv::Mat(contours[i]), hull[i], false); auto minimax_x = std::minmax_element( hull[i].begin(), hull[i].end(), [](const cv::Point &p1, const cv::Point &p2) { return p1.x < p2.x; }); } cv::Point left(minimax_x.first->x, minimax_x.first->y); cv::Point right(minimax_x.second->x, minimax_x.second->y);
6
Poses and gestures More shape properties
We can use “image moments” to find typical characteristics cv::moments returns a structure with three types of properties: spatial moments (m00, m01 …, …, m03, m30) central moments (mu…) central normalized moments (nu…) Useful to calculate the mass center auto moment = cv::moments(contours[i]); cv::Point center(cvRound(moment.m10 / moment.m00), cvRound(moment.m01 / moment.m00)); So what is a moment? mji=∑x,y(array(x,y)⋅xj⋅yi) cv::moments calculates them up to the 3rd order
7
Poses and gestures 0.02 0.02 0.0000… original Hu’s moments (Hu, 1962)
Hu’s moments are especially useful to compare shapes No need to use them directly, use cv::matchShapes on 2 contours, it returns a scalar value Nice property: scale, rotation and reflection invariant! 0.02 0.02 0.0000… original
8
Poses and gestures original 1 2 3 4 5
Hu’s moments on “original” compared to 1: 2: 3: 4: 5: original 1 2 3 4 5
9
Questions?
10
Poses and gestures Humans can pose in many different ways
Poses are often expressive or meaningful
11
Poses and gestures What do these poses tell you?
12
Poses and gestures What do you think these people are trying make you do?
13
Poses and gestures What is this facial expression telling you?
Without the context of the pose, a facial expression may be deceiving! More about faces and facial expressions next lecture!
14
Poses and gestures We can infer meaning or intention from a pose:
That makes them suitable as a means interaction We can associate different commands to different poses
15
Poses and gestures Poses are static A single pose conveys the meaning
Gestures are dynamic movements of the body in time The movement, not the pose, conveys the meaning
16
Poses and gestures Gestures are, by definition, a means of interaction
Sign language is highly standardized Consists of body static body poses and dynamic gestures Gestures made with the hand or the arm
17
Poses and gestures Finger gestures are commonly used in smart phones
Requires a touch sensor
18
Poses and gestures We focus on arm gestures:
Larger movement can be detected more robustly using computer vision Precise position (e.g. dragging) is less important
19
Pose recognition How to detect poses using computer vision?
We need to detect a distinct one per “key”/command An analysis: Poses are more recognizable when the arms are involved Poses are more recognizable when the person faces the camera
20
Pose recognition Silhouettes already carry quite some information
Let’s start with those A silhouette is basically a binary cut-out of a person Could be obtained using background subtraction
21
Pose recognition Once we have found the blob, we need to check whether it is a person in a certain position: Look at the shape of the blob like we did with the hands? Look at the width/height ratio? Look at the pixels?
22
Pose recognition Command #1 Command #2 Command #3
If there is a limited number of poses that we would like to recognize: Each can be a “template” Command #1 Command #2 Command #3
23
Pose recognition A template describes how the pose looks and has an associated label (“key”) What it look like can be anything: An image A height/width ratio Etc.
24
Pose recognition We can compare a “person blob” to all our templates
If a template “matches” our blob, it is recognized We don’t use contours this time, but the entire template surface Matching requires that we can measure differences: Easy for height/width ratios Slightly more difficult for images
25
Pose recognition7 Person blob Template XOR
Matching two binary images of similar size: Check each pixel (x,y) in template and person blob Count the total number of pixels that is different (percentage) Given that images are binary matrices: Calculate the XOR of the two matrices cv::bitwise_xor(…) does the trick Person blob Template XOR
26
Pose recognition Percentage of different pixels is measure of similarity: Lower percentage more similar When the percentage is “low enough”, we can say that the pose is recognized Requires that we empirically set a threshold The usual issues: Threshold too high we recognize poses that we shouldn’t Threshold too low we don’t recognize poses we should
27
Pose recognition If two binary images do not have the same size:
You can resize the person blob to fit the template Scaling is handled automatically Height/width ratio can be off You can add a border to the smallest image Scaling is not handled Height/width ratio is preserved
28
Pose recognition OpenCV has a function cv::matchTemplate(…) that finds a (small) image in a (larger) image: Works with grayscale as well as color images “Slides” the template across the image
29
Pose recognition cv::matchTemplate(…) does not work when the template and the pose in the image are of different size: You could have several templates at several scales Select the best match
30
Pose recognition Remember the challenges of background subtraction:
What happens if other people move in the background? What happens if there are shadows? Etc. cv::matchTemplate(…) allows you to only look at part of the body Crop to the upper body (or a body part) Saves you the hassle with shadows
31
Pose recognition Think about how to create your templates:
Should be taken from the same viewpoint (camera height and angle) Should generalize to different people Should be intuitive!
32
Questions?
33
Optical flow Optical flow is the apparent motion of the pixels in an image It describes for each pixel how it moves from one frame to the next All these “flow vectors” are called the “flow field”
34
Optical flow Flow says something about image movement
Can be used to look at gestures Can be used to look at gross body movement
35
Optical flow For two subsequent images, we can calculate the flow field Flow vector (u,v) for each pixel (x,y) u describes horizontal displacement v describes vertical displacement Further processing: Take the average of u and v values average displacement Also look at large motion vectors v u
36
Optical flow If we calculate flow over an entire image, we also include irrelevant movements: In the background Movement of the legs Etc. We could consider only movement in a person blob Use the silhouette after background subtraction Or skin colored regions
37
Optical flow If we extract skin color blobs (last lecture), we can “track” these: Find the movement from frame to frame Requires that we can “link” detections find out which blob in frame x corresponds to which blob in frame x+1 Linking can be based on (previous) position, movement and size Two challenges: Dealing with overlapping blobs (hand over face, hands over each other) Dealing with merging/splitting or appearing/disappearing blobs
38
Optical flow Tracking requires that you make assumptions:
How fast can movement be? How quickly can sizes change? How many blobs (so persons) can there be? Can people have short sleeves?
39
Optical flow We can also track “patches” (small images) within a larger image: Much like template matching but then over time Algorithm available in OpenCV: cv::CamShift
40
Optical flow Requires a small “template” or “patch” (e.g. a hand):
Can be “initialized” from one frame Processing steps for initialization: Find skin color blobs Figure out which is a hand Create a patch Processing steps for tracking: Find the patch in subsequent frames Just apply CamShift
41
Optical flow OpenCV supports a number of optical flow algorithms:
cv::optflow::calcOpticalFlowSF cv::optflow::createOptFlow_DeepFlow cv::optflow::calcOpticalFlowSparseToDense cv::optflow::createOptFlow_Farneback And tracking based on patches: cv::CamShift And a tracker based on position/speed measurement: cv::KalmanFilter Be aware that some of these may require you to compile the OpenCV contrib modules yourself! For more info, see:
42
Questions?
43
Gesture Recognition Once we have found movement in an image, we need to “classify” it: Assign a label from a limited set of labels (e.g. pointing right) Determine which “key” was pressed Two options: Do not consider temporal aspect Consider temporal aspect
44
Gesture Recognition Without time:
Find a particular movement at a specific time instant Examples: Average horizontal movement in image exceeds threshold One hand moves with at least x pixels to the left, the other hand moves at least x pixels to the right Make sure you “mask” the classification: You don’t want to classify the same gesture 3 frames in a row
45
Gesture Recognition One remark about frame rates:
The amount of movement per frame is dependent on the frame rate of your application Lower number of frames per second (fps) more movement per frame You might need to normalize for different frame rates
46
Gesture Recognition With time:
We have to consider movement over time, so over a number of frames Allows us to look at: More subtle movement that is performed over some time (e.g. movement for half a second) Movement that changes over time (e.g. drawing a circle)
47
Gesture Recognition Gestures typically follow several phases:
Preparation Stroke Retraction Preparation Stroke Retraction
48
Gesture Recognition The stroke is the “actual” gesture:
It looks more or less the same every time Movement during preparation and retract is limited Also termed “hold” Preparation and retraction have to do with the previous pose
49
Gesture Recognition So if we search for gestures:
We know they start with limited (or no) movement (the hold) Followed by a more or less predefined movement Ending with another hold Two options: Search for the holds and consider everything in between to be the gesture Consider n subsequent frames (e.g. half a second) and consider those the gesture
50
Gesture Recognition Final challenge: classify the gesture
Typically, we have tracked a hand: Position (x,y) for each frame Possibly speed (x’,y’) for each frame
51
Gesture Recognition If we consider the last n frames, we can make a list of 2D positions: {(x0,y0), (x1,y1)… (xn,yn)} Starting position is often not important: we can “normalize” for it: {(0,0), (x1-x0,y1-y0)… (xn-x0,yn-y0)} We need to compare this list to a number of gesture templates: Each template is a “key” and is encoded as a list of 2D points
52
Gesture Recognition Comparing two sets of 2D points:
Easy if sets have equal length Required some tricks if this is not the case When the sets have equal length, we can pair-wise compare the points: (Euclidian) distance between points 0…n-1 of template (xt,yt) and frame (xf,yf): Use cv::norm(…) Sum is the total difference between gesture and template Lower distance more similar Use a threshold to determine whether the gesture was made
53
Gesture Recognition When the sets are different in length:
We need to “align” them Several tricks: Time stretching/contraction (e.g. Dynamic Time Warping) Elastic matching A small demo:
54
Gesture Recognition Pose/gesture recognition recap:
You can recognize poses: Silhouettes template matching You can look at motion: Calculate optical flow Do tracking (CamShift or track blobs) Instantaneous criteria (above threshold, certain direction) Over time gesture recognition (using templates)
55
Questions?
56
Assignment The assignment is online: “Ambient Toilet Intelligence”
Your main tasks: Using a computer with a webcam Recognize a gesture or sign with a hand to make the system spray once or twice The system must recognize whether or not hands are washed. If not, let the LCD screen on the toilet freshener show an appropriate message Questions?
57
Next lecture Thursday March 8, 2015, 09:00-10:45 UNNIK-GROEN Computer Vision – Face Detection Practical sessions assignment 2: March 8, 11:00-12:45 BBG 106, 103, 109, 112 March 22, 11:00-12:45 BBG 106, 103, 109, 112
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.