Multimodal Interaction Dr. Mike Spann

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Scale & Affine Invariant Interest Point Detectors Mikolajczyk & Schmid presented by Dustin Lennon.
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Active Shape Models Suppose we have a statistical shape model –Trained from sets of examples How do we use it to interpret new images? Use an “Active Shape.
QR Code Recognition Based On Image Processing
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
Automatic determination of skeletal age from hand radiographs of children Image Science Institute Utrecht University C.A.Maas.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Automatic Feature Extraction for Multi-view 3D Face Recognition
Mapping: Scaling Rotation Translation Warp
Wangfei Ningbo University A Brief Introduction to Active Appearance Models.
AAM based Face Tracking with Temporal Matching and Face Segmentation Dalong Du.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
A 4-WEEK PROJECT IN Active Shape and Appearance Models
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.
Face Recognition Based on 3D Shape Estimation
Active Appearance Models Computer examples A. Torralba T. F. Cootes, C.J. Taylor, G. J. Edwards M. B. Stegmann.
3D Geometry for Computer Graphics
CS292 Computational Vision and Language Visual Features - Colour and Texture.
Active Appearance Models based on the article: T.F.Cootes, G.J. Edwards and C.J.Taylor. "Active Appearance Models", presented by Denis Simakov.
Statistical Shape Models Eigenpatches model regions –Assume shape is fixed –What if it isn’t? Faces with expression changes, organs in medical images etc.
PhD Thesis. Biometrics Science studying measurements and statistics of biological data Most relevant application: id. recognition 2.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
כמה מהתעשייה? מבנה הקורס השתנה Computer vision.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Statistical Models of Appearance for Computer Vision
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Active Shape Models: Their Training and Applications Cootes, Taylor, et al. Robert Tamburo July 6, 2000 Prelim Presentation.
Machine Vision for Robots
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
1 ECE 738 Paper presentation Paper: Active Appearance Models Author: T.F.Cootes, G.J. Edwards and C.J.Taylor Student: Zhaozheng Yin Instructor: Dr. Yuhen.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Multimodal Information Analysis for Emotion Recognition
MEDICAL IMAGE ANALYSIS Marek Brejl Vital Images, Inc.
Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.
Medical Image Analysis Image Registration Figures come from the textbook: Medical Image Analysis, by Atam P. Dhawan, IEEE Press, 2003.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Medical Image Analysis Dr. Mohammad Dawood Department of Computer Science University of Münster Germany.
CSE 185 Introduction to Computer Vision Face Recognition.
Computer Vision Hough Transform, PDM, ASM and AAM David Pycock
Multimodal Interaction Dr. Mike Spann
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
Point Distribution Models Active Appearance Models Compilation based on: Dhruv Batra ECE CMU Tim Cootes Machester.
Statistical Models of Appearance for Computer Vision 主講人:虞台文.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Multimodal Interaction Dr. Mike Spann
MASKS © 2004 Invitation to 3D vision Lecture 3 Image Primitives andCorrespondence.
Level Set Segmentation ~ 9.37 Ki-Chang Kwak.
CIVET seminar Presentation day: Presenter : Park, GilSoon.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
CSE 554 Lecture 8: Alignment
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Statistical Shape Modelling
Dynamical Statistical Shape Priors for Level Set Based Tracking
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
LECTURE 09: DISCRIMINANT ANALYSIS
EE 492 ENGINEERING PROJECT
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
Presentation transcript:

Multimodal Interaction Dr. Mike Spann

Contents Introduction Lip feature extraction and tracking Summary

Lip feature extraction and tracking Lip feature tracking is an important in combining audio and visual cues for speech recognition systems Typically the lip boundaries (inner/outer/both) are tracked and shape features passed to the speech recognition module Previous approaches Active contour model (snakes) Energy function minimisation used to control contour shape (curvature) and local greylevel (colour) gradient Can be dependant on weighting parameters which need to be tuned

Lip feature extraction and tracking Typically an energy function E is defined in terms of the parameterised snake v(s)=(x(s),y(s)) where s is the distance along the snake: The first two terms represent the snake’s internal energy and control it’s tension and rigidity The third term attracts the snake to object boundaries with high greylevel gradient Often an additional term is added for a ‘balloon’ snake to either inflate or deflate the snake

Lip feature extraction and tracking

More recent approaches to lip localisation and tracking have been model-based A statistical shape model of the inner and outer lip contours can be built from training data Landmarks on the contour form pointsets: We need to align the pointsets and then build a statistical model using PCA

Lip feature extraction and tracking Pointsets of lip feature landmarks must be normalized for translation, scale and rotation We can use a simple iterative algorithm to align to the mean pointset

Lip feature extraction and tracking PCA is based on the mean and covariance of the pointset vectors computed across the training set: We then compute our shape model by solving the eigenvector/eigenvalue equation: where Λ is a diagonal matrix of eigenvalues :

Lip feature extraction and tracking We can represent each landmark pointset x by a corresponding shape vector b The set of b i ’s across all of the pointsets in the database represents the i th mode of variation of the original data We can vary each b i to get realistic versions of lip shapes Typically for the i th eigenvalue λ i :

Lip feature extraction and tracking

An active shape model sample greylevels perpendicular to the lip contour and centred at the model points

Lip feature extraction and tracking We sample the profiles perpendicular to each model point j Training image i then gives us a vector of greylevels g ij We concatenate all these greylevel vectors to give us a global profile vector h i We build a statistical model out of these profile vectors to enable the main modes of variation of the profiles about the model boundaries to be computed

Lip feature extraction and tracking The weight vectors b h can be used as a parameter in a cost function to determine how well the actual profile fits the model

Lip feature extraction and tracking The greylevels between profile vectors can be interpolated to visualise the greylevel models Some smoothing using a median filter helps remove any artefacts of the interpolation We can visualise several modes corresponding to the first few eigenvectors The corresponding components of the weight vector b h can be varied according to: For example we can set b hi to ±2√λ i for i=1,2,3

Lip feature extraction and tracking Mode 1 Global illumination differences Mode 2 Lower/Upper lip intensity difference Mode 3 Skin/lip contrast differences Higher modes Illumination variations, visibility of teeth and tongue etc

Lip feature extraction and tracking In order to apply an ASM search algorithm, a coarse estimate of the region of interest containing the lips region is found Can be input interactively or computed automatically using segmentation or edge-based feature extraction algorithm Provides an estimate of the scale of the lips Limits the search area

Lip feature extraction and tracking

In order to use the greylevel and shape models in a search algorithm we can use the greylevel model to best fit the model greylevel profile to the current greylevel profile Shape and pose parameters can then be updated We need a cost function which describes the fit between the model greylevel profile and the profile measured in the image at the current model position Several statistical approaches possible Maximizing the probability assuming Gaussian distributions Minimizing the mean square error between the profiles

Current model position Sample profile h

Lip feature extraction and tracking We can define a error function E defining the mismatch between the actual profile h measured at the current position estimate and our model profile h m : Substituting for h m : Typically h m would comprise only the first few modes of variation

Lip feature extraction and tracking The model is initialized with the mean shape computed over aligned shapes in the training set Our goal is to minimize our energy function E in terms of translation vectors t x and t y, a scale parameter s and a rotation angle θ along with the profile parameter vector :

Lip feature extraction and tracking Optimization is carried out by perturbing individual parameters and evaluating their effects on the energy function E Typically only a few (typically 10-20) shape modes are used in the search to ease the computational burden Perturbations in b i are limited to: For a given position of the model landmarks, the profile h is sampled and b h computed according to:

Lip feature extraction and tracking We can devise an iterative algorithm to update the pose and shape parameters sequentially based on our error measure The algorithm alternates between ‘model space’ and ‘image space’ The object boundary in model space is defined by the shape parameters We can use the greylevel or colour profile information to measure the error in image space Conversion between the two spaces is done via the pose parameters

Lip feature extraction and tracking Model space - b Image space - b h

Lip feature extraction and tracking 1.Initialize the shape parameters b to zero and image points y 2.Generate the model point positions: 3. Find the pose parameters t x,t y, s, θ to best fit the model points to the image points y 4.Project the model points into the image frame x->T(x), compute the image profile vector h and at each projected model point, search normal to the model boundary and find the image points y’ which minimize E to produce new image profile vector h’

Lip feature extraction and tracking 6. Project the image points y’ into the model coordinate frame by inverting the transformation T 7. Update the model parameters 8. If not converged y->y’. Go to step 2

Lip feature extraction and tracking Image boundary Model point Nearest image point to model point

Lip feature extraction and tracking Its easier to track the outer lips than the inner ones More constant greylevel profile Easier to model for example with application to active shape modelling But, less appropriate for lip gesture recognition and speech recognition algorithms Often using a full appearance model rather than just a shape model gives better speech recognition performance For example the teeth and tongue appearance give clues to particular types of vocal sounds

Lip feature extraction and tracking Results of off centre initialization of ASM using local greylevel profiles after 5, 10, 20, iterations

Lip feature extraction and tracking Results using ASM search with local greylevel profiles

Lip feature extraction and tracking

Demo ts/lip_tracking/index.html ts/lip_tracking/index.html

Summary We have looked at a shape model and a model describing greylevel or colour variation local to the shape model landmark positions can be used for finding the lip contour location in face images We have described an iterative model-based search algorithm for lip contour location We have shown lip tracking results based on this algorithm