Interaction Technology 2016

Slides:

Advertisements

Similar presentations

Advanced Image Processing Student Seminar: Lipreading Method using color extraction method and eigenspace technique ( Yasuyuki Nakata and Moritoshi Ando.

Advertisements

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

Robust statistical method for background extraction in image segmentation Doug Keen March 29, 2001.

人機介面 Gesture Recognition

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

Simple Face Detection system Ali Arab Sharif university of tech. Fall 2012.

1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen

DIGITAL IMAGE PROCESSING

Facial feature localization Presented by: Harvest Jang Spring 2002.

September 10, 2013Computer Vision Lecture 3: Binary Image Processing 1Thresholding Here, the right image is created from the left image by thresholding,

Color Image Processing

 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

1 Formation et Analyse d’Images Session 12 Daniela Hall 16 January 2006.

Recent Developments in Human Motion Analysis

CS 376b Introduction to Computer Vision 04 / 01 / 2008 Instructor: Michael Eckmann.

Objective of Computer Vision

COMP 290 Computer Vision - Spring Motion II - Estimation of Motion field / 3-D construction from motion Yongjik Kim.

A Real-Time for Classification of Moving Objects

CS4670: Computer Vision Kavita Bala Lecture 8: Scale invariance.

The Recognition of Human Movement Using Temporal Templates Liat Koren.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.

A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,

Poster Title XXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX Name Department of East Carolina University Learning Objectives This is a basic.

Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.

1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.

A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.

December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.

September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.

Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.

Digital Image Processing CCS331 Relationships of Pixel 1.

出處： Signal Processing and Communications Applications, 2006 IEEE 作者： Asanterabi Malima, Erol Ozgur, and Miijdat Cetin 2015/10/251 指導教授：張財榮學生：陳建宏學號： M97G0209.

Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

Tracking CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Chapter 4 Working with Frames. Align and distribute objects on a page Stack and layer objects Work with graphics frames Work with text frames Chapter.

Action as Space-Time Shapes

Motion Analysis using Optical flow CIS750 Presentation Student: Wan Wang Prof: Longin Jan Latecki Spring 2003 CIS Dept of Temple.

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.

CSC508 Convolution Operators. CSC508 Convolution Arguably the most fundamental operation of computer vision It’s a neighborhood operator –Similar to the.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Rick Parent - CIS681 Motion Analysis – Human Figure Processing video to extract information of objects Motion tracking Pose reconstruction Motion and subject.

Creating a Silhouette in Illustrator. Go File>Place and place the photo on the artboard. Select the photo and click Live Trace (its on the tool bar right.

1 Motion Analysis using Optical flow CIS601 Longin Jan Latecki Fall 2003 CIS Dept of Temple University.

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

 In this packet we will look at:  The meaning of acceleration  How acceleration is related to velocity and time  2 distinct types acceleration  A.

October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],

Gesture Recognition 12/3/2009.

Presented by: Idan Aharoni

Face Detection Using Color Thresholding and Eigenimage Template Matching Diederik Marius Sumita Pennathur Klint Rose.

Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.

Motion tracking TEAM D, Project 11: Laura Gui - Timisoara Calin Garboni - Timisoara Peter Horvath - Szeged Peter Kovacs - Debrecen.

Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.

October 3, 2013Computer Vision Lecture 10: Contour Fitting 1 Edge Relaxation Typically, this technique works on crack edges: pixelpixelpixel pixelpixelpixelebg.

Hand Gestures Based Applications

Color Image Processing

Color Image Processing

3D Vision Interest Points.

Color Image Processing

Mean Shift Segmentation

Common Classification Tasks

Computer Vision Lecture 5: Binary Image Processing

Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.

Computer Vision Lecture 16: Texture II

Color Image Processing

Brief Review of Recognition + Context

Presentation transcript:

Interaction Technology 2016 Faculty of Science Information and Computing Sciences Interaction Technology Interaction Technology 2016 Shape and Gesture Recognition Coert van Gemeren

Today Shape recognition Poses and gestures Pose recognition Optical flow and tracking Gesture recognition Assignments

Shape recognition

Shape recognition We know how to isolate shapes in an image using cv::inRange on the pixels in the range We know how to find the contours and the convex hull on the different shapes using cv::findContours and cv::convexHull

Shape recognition Using a simple minmax-function we can find the spatial extremes of the hull: std::vector<std::vector<cv::Point>> contours; cv::findContours(…); std::vector<std::vector<cv::Point>> hull(contours.size()); for (int i = 0; i < (int) contours.size(); ++i) { cv::convexHull(cv::Mat(contours[i]), hull[i], false); auto minimax_x = std::minmax_element( hull[i].begin(), hull[i].end(), [](const cv::Point &p1, const cv::Point &p2) { return p1.x < p2.x; }); } cv::Point left(minimax_x.first->x, minimax_x.first->y); cv::Point right(minimax_x.second->x, minimax_x.second->y);

Poses and gestures More shape properties We can use “image moments” to find typical characteristics cv::moments returns a structure with three types of properties: spatial moments (m00, m01 …, …, m03, m30) central moments (mu…) central normalized moments (nu…) Useful to calculate the mass center auto moment = cv::moments(contours[i]); cv::Point center(cvRound(moment.m10 / moment.m00), cvRound(moment.m01 / moment.m00)); So what is a moment? mji=∑x,y(array(x,y)⋅xj⋅yi) cv::moments calculates them up to the 3rd order

Poses and gestures 0.02 0.02 0.0000… original Hu’s moments (Hu, 1962) Hu’s moments are especially useful to compare shapes No need to use them directly, use cv::matchShapes on 2 contours, it returns a scalar value Nice property: scale, rotation and reflection invariant! 0.02 0.02 0.0000… original

Poses and gestures original 1 2 3 4 5 Hu’s moments on “original” compared to 1: 0.992691 2: 0.723344 3: 0.689404 4: 0.462559 5: 0.11041 original 1 2 3 4 5

Questions?

Poses and gestures Humans can pose in many different ways Poses are often expressive or meaningful

Poses and gestures What do these poses tell you?

Poses and gestures What do you think these people are trying make you do?

Poses and gestures What is this facial expression telling you? Without the context of the pose, a facial expression may be deceiving! More about faces and facial expressions next lecture!

Poses and gestures We can infer meaning or intention from a pose: That makes them suitable as a means interaction We can associate different commands to different poses

Poses and gestures Poses are static A single pose conveys the meaning Gestures are dynamic movements of the body in time The movement, not the pose, conveys the meaning

Poses and gestures Gestures are, by definition, a means of interaction Sign language is highly standardized Consists of body static body poses and dynamic gestures Gestures made with the hand or the arm

Poses and gestures Finger gestures are commonly used in smart phones Requires a touch sensor

Poses and gestures We focus on arm gestures: Larger movement  can be detected more robustly using computer vision Precise position (e.g. dragging) is less important

Pose recognition How to detect poses using computer vision? We need to detect a distinct one per “key”/command An analysis: Poses are more recognizable when the arms are involved Poses are more recognizable when the person faces the camera

Pose recognition Silhouettes already carry quite some information Let’s start with those A silhouette is basically a binary cut-out of a person Could be obtained using background subtraction

Pose recognition Once we have found the blob, we need to check whether it is a person in a certain position: Look at the shape of the blob like we did with the hands? Look at the width/height ratio? Look at the pixels?

Pose recognition Command #1 Command #2 Command #3 If there is a limited number of poses that we would like to recognize: Each can be a “template” Command #1 Command #2 Command #3

Pose recognition A template describes how the pose looks and has an associated label (“key”) What it look like can be anything: An image A height/width ratio Etc.

Pose recognition We can compare a “person blob” to all our templates If a template “matches” our blob, it is recognized We don’t use contours this time, but the entire template surface Matching requires that we can measure differences: Easy for height/width ratios Slightly more difficult for images

Pose recognition7 Person blob Template XOR Matching two binary images of similar size: Check each pixel (x,y) in template and person blob Count the total number of pixels that is different (percentage) Given that images are binary matrices: Calculate the XOR of the two matrices cv::bitwise_xor(…) does the trick Person blob Template XOR

Pose recognition Percentage of different pixels is measure of similarity: Lower percentage  more similar When the percentage is “low enough”, we can say that the pose is recognized Requires that we empirically set a threshold The usual issues: Threshold too high  we recognize poses that we shouldn’t Threshold too low  we don’t recognize poses we should

Pose recognition If two binary images do not have the same size: You can resize the person blob to fit the template Scaling is handled automatically Height/width ratio can be off You can add a border to the smallest image Scaling is not handled Height/width ratio is preserved

Pose recognition OpenCV has a function cv::matchTemplate(…) that finds a (small) image in a (larger) image: Works with grayscale as well as color images “Slides” the template across the image

Pose recognition cv::matchTemplate(…) does not work when the template and the pose in the image are of different size: You could have several templates at several scales Select the best match

Pose recognition Remember the challenges of background subtraction: What happens if other people move in the background? What happens if there are shadows? Etc. cv::matchTemplate(…) allows you to only look at part of the body Crop to the upper body (or a body part) Saves you the hassle with shadows

Pose recognition Think about how to create your templates: Should be taken from the same viewpoint (camera height and angle) Should generalize to different people Should be intuitive!

Questions?

Optical flow Optical flow is the apparent motion of the pixels in an image It describes for each pixel how it moves from one frame to the next All these “flow vectors” are called the “flow field”

Optical flow Flow says something about image movement Can be used to look at gestures Can be used to look at gross body movement

Optical flow For two subsequent images, we can calculate the flow field Flow vector (u,v) for each pixel (x,y) u describes horizontal displacement v describes vertical displacement Further processing: Take the average of u and v values  average displacement Also look at large motion vectors v u

Optical flow If we calculate flow over an entire image, we also include irrelevant movements: In the background Movement of the legs Etc. We could consider only movement in a person blob Use the silhouette after background subtraction Or skin colored regions

Optical flow If we extract skin color blobs (last lecture), we can “track” these: Find the movement from frame to frame Requires that we can “link” detections  find out which blob in frame x corresponds to which blob in frame x+1 Linking can be based on (previous) position, movement and size Two challenges: Dealing with overlapping blobs (hand over face, hands over each other) Dealing with merging/splitting or appearing/disappearing blobs

Optical flow Tracking requires that you make assumptions: How fast can movement be? How quickly can sizes change? How many blobs (so persons) can there be? Can people have short sleeves?

Optical flow We can also track “patches” (small images) within a larger image: Much like template matching but then over time Algorithm available in OpenCV: cv::CamShift

Optical flow Requires a small “template” or “patch” (e.g. a hand): Can be “initialized” from one frame Processing steps for initialization: Find skin color blobs Figure out which is a hand Create a patch Processing steps for tracking: Find the patch in subsequent frames Just apply CamShift

Optical flow OpenCV supports a number of optical flow algorithms: cv::optflow::calcOpticalFlowSF cv::optflow::createOptFlow_DeepFlow cv::optflow::calcOpticalFlowSparseToDense cv::optflow::createOptFlow_Farneback And tracking based on patches: cv::CamShift And a tracker based on position/speed measurement: cv::KalmanFilter Be aware that some of these may require you to compile the OpenCV contrib modules yourself! For more info, see: http://putuyuwono.wordpress.com/2015/04/23/building-and-installing-opencv-3-0-on-windows-7-64-bit/

Questions?

Gesture Recognition Once we have found movement in an image, we need to “classify” it: Assign a label from a limited set of labels (e.g. pointing right) Determine which “key” was pressed Two options: Do not consider temporal aspect Consider temporal aspect

Gesture Recognition Without time: Find a particular movement at a specific time instant Examples: Average horizontal movement in image exceeds threshold One hand moves with at least x pixels to the left, the other hand moves at least x pixels to the right Make sure you “mask” the classification: You don’t want to classify the same gesture 3 frames in a row

Gesture Recognition One remark about frame rates: The amount of movement per frame is dependent on the frame rate of your application Lower number of frames per second (fps)  more movement per frame You might need to normalize for different frame rates

Gesture Recognition With time: We have to consider movement over time, so over a number of frames Allows us to look at: More subtle movement that is performed over some time (e.g. movement for half a second) Movement that changes over time (e.g. drawing a circle)

Gesture Recognition Gestures typically follow several phases: Preparation Stroke Retraction Preparation Stroke Retraction

Gesture Recognition The stroke is the “actual” gesture: It looks more or less the same every time Movement during preparation and retract is limited Also termed “hold” Preparation and retraction have to do with the previous pose

Gesture Recognition So if we search for gestures: We know they start with limited (or no) movement (the hold) Followed by a more or less predefined movement Ending with another hold Two options: Search for the holds and consider everything in between to be the gesture Consider n subsequent frames (e.g. half a second) and consider those the gesture

Gesture Recognition Final challenge: classify the gesture Typically, we have tracked a hand: Position (x,y) for each frame Possibly speed (x’,y’) for each frame

Gesture Recognition If we consider the last n frames, we can make a list of 2D positions: {(x0,y0), (x1,y1)… (xn,yn)} Starting position is often not important: we can “normalize” for it: {(0,0), (x1-x0,y1-y0)… (xn-x0,yn-y0)} We need to compare this list to a number of gesture templates: Each template is a “key” and is encoded as a list of 2D points

Gesture Recognition Comparing two sets of 2D points: Easy if sets have equal length Required some tricks if this is not the case When the sets have equal length, we can pair-wise compare the points: (Euclidian) distance between points 0…n-1 of template (xt,yt) and frame (xf,yf): Use cv::norm(…) Sum is the total difference between gesture and template Lower distance  more similar Use a threshold to determine whether the gesture was made

Gesture Recognition When the sets are different in length: We need to “align” them Several tricks: Time stretching/contraction (e.g. Dynamic Time Warping) Elastic matching A small demo: http://depts.washington.edu/aimgroup/proj/dollar/

Gesture Recognition Pose/gesture recognition recap: You can recognize poses: Silhouettes  template matching You can look at motion: Calculate optical flow Do tracking (CamShift or track blobs) Instantaneous  criteria (above threshold, certain direction) Over time  gesture recognition (using templates)

Questions?

Assignment The assignment is online: “Ambient Toilet Intelligence” http://www.cs.uu.nl/docs/vakken/b3it/assignment2.html Your main tasks: Using a computer with a webcam Recognize a gesture or sign with a hand to make the system spray once or twice The system must recognize whether or not hands are washed. If not, let the LCD screen on the toilet freshener show an appropriate message Questions?

Next lecture Thursday March 8, 2015, 09:00-10:45 UNNIK-GROEN Computer Vision – Face Detection Practical sessions assignment 2: March 8, 11:00-12:45 BBG 106, 103, 109, 112 March 22, 11:00-12:45 BBG 106, 103, 109, 112