Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec.

Slides:



Advertisements
Similar presentations
ARTIFICIAL PASSENGER.
Advertisements

Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Descriptive schemes for facial expression introduction.
Bruxelles, October 3-4, Interface Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments IST Concertation.
Advanced Image Processing Student Seminar: Lipreading Method using color extraction method and eigenspace technique ( Yasuyuki Nakata and Moritoshi Ando.
Probabilistic Tracking and Recognition of Non-rigid Hand Motion
DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.
ECE 5367 – Presentation Prepared by: Adnan Khan Pulin Patel
SmartPlayer: User-Centric Video Fast-Forwarding K.-Y. Cheng, S.-J. Luo, B.-Y. Chen, and H.-H. Chu ACM CHI 2009 (international conference on Human factors.
Collection and Analysis of Multimodal Interaction in Direction Giving Dialogues Seikei University Takeo TsukamotoYumi Muroya Masashi Okamoto Yukiko Nakano.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
3D Face Modeling Michaël De Smet.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
GMM-Based Multimodal Biometric Verification Yannis Stylianou Yannis Pantazis Felipe Calderero Pedro Larroy François Severin Sascha Schimke Rolando Bonal.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Video Rewrite Driving Visual Speech with Audio Christoph Bregler Michele Covell Malcolm Slaney Presenter : Jack jeryes 3/3/2008.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Real-Time Audio-Visual Automatic Speech Recognition Demonstrator TSI-TUC, Greece (A. Potamianos, E. Sanchez-Soto, M. Perakakis) NTUA, Greece (P. Maragos,
RECOGNIZING FACIAL EXPRESSIONS THROUGH TRACKING Salih Burak Gokturk.
Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.
ENTERFACE ’10 Amsterdam, July-August 2010 Hamdi Dibeklio ğ lu Ilkka Kosunen Marcos Ortega Albert Ali Salah Petr Zuzánek.
Vision-based Control of 3D Facial Animation Jin-xiang Chai Jing Xiao Jessica Hodgins Carnegie Mellon University.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Facial Recognition CSE 391 Kris Lord.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Facial Feature Detection
Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander.
Karthiknathan Srinivasan Sanchit Aggarwal
Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Project 10 Facial Emotion Recognition Based On Mouth Analysis SSIP 08, Vienna 1
Multimodal Interaction Dr. Mike Spann
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
ICat as a Companion Robot Siska Fitrianie Dragos Datcu Alin Chitu L.J.M Rothkrantz {s.fitrianie, d.datcu, a.g.chitu,
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Multimodal Information Analysis for Emotion Recognition
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Access Control Via Face Recognition. Group Members  Thilanka Priyankara  Vimalaharan Paskarasundaram  Manosha Silva  Dinusha Perera.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
PRESENTED BY TARUN CHUGH ROLL NO: DATE OF PRESENTATION :-29/09/2010 ARTIFICIAL PASSENGER.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.
AAM based Face Tracking with Temporal Matching and Face Segmentation Mingcai Zhou 1 、 Lin Liang 2 、 Jian Sun 2 、 Yangsheng Wang 1 1 Institute of Automation.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Spring 2007 COMP TUI 1 Computer Vision for Tangible User Interfaces.
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Face Detection 蔡宇軒.
MIT Artificial Intelligence Laboratory — Research Directions Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson.
FACE RECOGNITION. A facial recognition system is a computer application for automatically identifying or verifying a person from a digital image or a.
Automated Detection of Human Emotion
Contents Team introduction Project Introduction Applicability
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Video-based human motion recognition using 3D mocap data
Multimodal Caricatural Mirror
Project #2 Multimodal Caricatural Mirror Intermediate report
AHED Automatic Human Emotion Detection
EE 492 ENGINEERING PROJECT
AHED Automatic Human Emotion Detection
Presentation transcript:

Juli 2010 eNTERFACE Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec.

Juli 2010 eNTERFACE Lip reading Facial expression recognition Automatic recognition of facial expressions and lipreading using vector flow Model based approach

Juli 2010 eNTERFACE What makes visual speech recognition so hard?  Visemes  Smaller word separability  Speech info in audio > Speech info in video

Juli 2010 eNTERFACE Lip-reading by Humans  People recognize speech better when the signal is both auditory and visual  The difference in recognition rates grows with the level of noise in the environment

Juli 2010 eNTERFACE Inspiration  In the 1968 Stanley Kubrick film 2001: A space odyssey the computer reads from the lip- movements the conversation of two astronauts.  Thirty years later automated lip- reading becomes a significant part of research in speech recognition systems.

Juli 2010 eNTERFACE

New speech corpus AV speech corpus

Juli 2010 eNTERFACE

Databases of different quality and resolution

Juli 2010 eNTERFACE Recording a new speech corpus AV speech corpus Visemes|Corpus|Tracking|Features Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Juli 2010 eNTERFACE Recording a new speech corpus AV speech corpus Visemes|Corpus|Tracking|Features Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Juli 2010 eNTERFACE New speech corpus  Dutch  Recorded at high-speed: 100 fps  Front and profile views included  70 people 49 male, 21 female Students, professors, secretaries, friends  Utterances: Sentences, digits, spelling, conversation starters/endings, open questions Normal, fast, whispering AV speech corpus Visemes|Corpus|Tracking|Features Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Juli 2010 eNTERFACE New speech corpus AV speech corpus Visemes|Corpus|Tracking|Features Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations

Juli 2010 eNTERFACE Lip-reading by Humans  People recognize speech better when the signal is both auditory and visual  The difference in recognition rates grows with the level of noise in the environment

Juli 2010 eNTERFACE ISFER Workbench Examples (continued)

Juli 2010 eNTERFACE Active Contours  Internal and external energies Internal energy forces contour to shrink Locally defined external energy forces the contour to stop at the edge of the mouth  Computationally cheap  Sensitivity to initial setting of the contour

Juli 2010 eNTERFACE Template Matching  Internal and external energies Internal energy forces template to maintain geometry Globally defined external energy forces appropriate placement on the picture  Better results than with snakes  Integration of energy functions at each step can be very time consuming

Juli 2010 eNTERFACE Model  Goal: lip-reading  Needed: accurate description of visible parts of articulatory system  Accurate description of the shape of the mouth: measurements of the distance of the lip to a center of the mouth measurements of thickness of visible part of the lips

Juli 2010 eNTERFACE Data processing  Filtered image -intensity distribution -center of mouth  Image in polar coordinates  Conditional distribution  Mean and variance functions ( continued )

Juli 2010 eNTERFACE Data visualization  Single frame data vector:

Juli 2010 eNTERFACE Results of Experiments  Feed Forward BP Vanmiddag komt de pianostemmer langs om mijn vleugel te stemmen

Juli 2010 eNTERFACE

Tracking the face – Optical flow  Capturing apparent motion of subsequent images in a grid of motion vectors  Advantages No lip model required Good at capturing motion  Disadvantage Slow Face tracking

Juli 2010 eNTERFACE Tracking the face – Lip Geometry Estimation  Applying some color filters and capturing the lip contours in polar coordinates  Advantages No lip model required More or less person-independent  Disadvantage Not robust to external factors Face tracking

Juli 2010 eNTERFACE Tracking the face – Active Appearance Models  Point tracking according to a statistical lip model  Disadvantage Requires annotated training images  Advantages Robust against external factors Fast! Face tracking

Juli 2010 eNTERFACE Active Appearance Models – Design of the lip model Face tracking

Juli 2010 eNTERFACE AAM model point coordinates Face tracking

Juli 2010 eNTERFACE Features plotted for “F” Feature extraction time (frames)

Juli 2010 eNTERFACE 5-states HMM

Juli 2010 eNTERFACE Automatic bi-modal human emotion recognition Automatic recognition of facial expressions using active Appearance model Model based approach

Juli 2010 eNTERFACE Face localization

Juli 2010 eNTERFACE User-interface prototype iCat to help users in daily tasks.

Juli 2010 eNTERFACE M.A.E.L.I.A. Our digital cat H.C.I. Group

Juli 2010 eNTERFACE H.C.I. Group

Juli 2010 eNTERFACE H.C.I. Group

Juli 2010 eNTERFACE Requirements in other words… Are you out of your mind? I am sleeping!!! Get a life! I am still sleeping! I am so bored! I wish I had a companion! 7:00 AM8:00 AM 11:00 AM14:00 AM I feel so lonely!!! I am very sad and depressed. 16:00 AM Finally I have a friend! I am so happy and I even managed to pick up the bone! Wow!!! AIBO! Bring me my newspaper!!! AIBO! Let’s play!!! Follow me

Juli 2010 eNTERFACE Multimodal Communication Uh, …. I have no time to do anything with you Hello, do you like to chat with me ? Uh, what a nerd I want a date She looks nice

Juli 2010 eNTERFACE Multi-modal interaction

Juli 2010 eNTERFACE

Would you like to join me for a dinner ?

Juli 2010 eNTERFACE

Chat-session  A cup of tea?  Mmh, njeh, I don’t like tea.  What’s wrong with tea?  Tea makes me sick.  That’s nonsense!!  And my sister doesn’t like you too!  She is very disappointed!!  Hihi, I was joking!!!  Oh, that’s funny!!!

Juli 2010 eNTERFACE Chat-session  (f)A cup of tea?: - )  (m)Mmh, njeh, I don’t like tea.(: - (  (f)What’s wrong with tea?: - o  (m)Tea makes me sick.% - \  (f)That’s nonsense!!: - l l  (f)My sister doesn’t like you too!: - l l  (f)She is very disappointed!!: - (  (m)Hihi, I was joking!!!; - )  (f)Oh, that’s funny!!!: - ]

Juli 2010 eNTERFACE A cup of tea? : - )

Juli 2010 eNTERFACE Mmh, njeh, I don’t like tea. (: - (

Juli 2010 eNTERFACE What’s wrong with tea? : - o

Juli 2010 eNTERFACE Tea makes me sick. % - \

Juli 2010 eNTERFACE That’s nonsense!! : - l l

Juli 2010 eNTERFACE My sister doesn’t like you too! : - l l

Juli 2010 eNTERFACE She is very disappointed!! : - (

Juli 2010 eNTERFACE Hihi, I was joking!!! ; - )

Juli 2010 eNTERFACE Oh, that’s funny!!! : - ]