Österreichisches Forschnungsinstitut für Artificial Intelligence Featuring the GEMEP Corpus Experiences and Future Plans Hannes Pirker OFAI, Vienna.

Slides:



Advertisements
Similar presentations
Gesture recognition using salience detection and concatenated HMMs Ying Yin Randall Davis Massachusetts Institute.
Advertisements

Descriptive schemes for facial expression introduction.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
4 Mei 2009Universitaet WIEN1 Stability of Equilibrium Points Roger J-B Wets, Univ. California, Davis with Alejandro Jofr é, Universidad de Chile.
DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.
Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,
BODY LANGUAGE. Body Language Charles Darwin 1873 “ The expression of the emotions in man and animals” Desmond Morris 1967 “ The naked ape” Lie detectors.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
International Conference on Automatic Face and Gesture Recognition, 2006 A Layered Deformable Model for Gait Analysis Haiping Lu, K.N. Plataniotis and.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
Creating Dances - 3 Motif & Development. Objectives Recognise motifs Know how to create & develop motifs.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
Learn how to make your drawings come alive…  Lecture 2: SKETCH RECOGNITION Analysis, implementation, and comparison of sketch recognition algorithms,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
KINS 151 Website
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Exploiting video information for Meeting Structuring ….
Body Expression of Emotion (BEE)
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
Expressive Emotional ECA ✔ Catherine Pelachaud ✔ Christopher Peters ✔ Maurizio Mancini.
The persuasive import of gesture and gaze. The importance of bodily behaviour in persuasive discourse has been acknowledged as early as in ancient Rome.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Gait Recognition Guy Bar-hen Tal Reis. Introduction Gait – is defined as a “manner of walking”. Gait recognition – –is the term typically used to refer.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Multimodal Information Analysis for Emotion Recognition
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Human pose recognition from depth image MS Research Cambridge.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
A Fully Annotated Corpus of Russian Speech
Österreichisches Forschnungsinstitut für Artificial Intelligence Representational Lego for ECAs Brigitte Krenn.
Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4.
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
Delivering Business Value through IT Face feature detection using Java and OpenCV 1.
Detecting Accent Sandhi in Japanese Using a Superpositional F0 Model Atsuhiro Sakurai Hiromichi Kawanami Keikichi Hirose Depart. of Communication and Information.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Non Verbal Communication. What Is Paralanguage? DEFINITION Paralanguage is the voice intonation that accompanies speech, including voice pitch, voice.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
Hand Detection with a Cascade of Boosted Classifiers Using Haar-like Features Qing Chen Discover Lab, SITE, University of Ottawa May 2, 2006.
Presented By Meet Shah. Goal  Automatically predicting the respondent’s reactions (accept or reject) to offers during face to face negotiation by analyzing.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Modeling Expressivity in ECAs
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Computer vision: models, learning and inference
Oral Communication Skills
AHED Automatic Human Emotion Detection
Gait Recognition Gökhan ŞENGÜL.
External Communication
CH. 1: Introduction 1.1 What is Machine Learning Example:
Studying Spoken Language Text 17, 18 and 19
Performing Poetry.
Multimodal Caricatural Mirror
Multimodal Caricatural Mirror
Project #2 Multimodal Caricatural Mirror Intermediate report
A Generative Audio-Visual Prosodic Model for Virtual
Presentation transcript:

Österreichisches Forschnungsinstitut für Artificial Intelligence Featuring the GEMEP Corpus Experiences and Future Plans Hannes Pirker OFAI, Vienna

Österreichisches Forschnungsinstitut für Artificial Intelligence Overview Features from Audio Channel Segments and pitch contours Features from Video Channel Faces, silhouettes and hands Discussion What we do have & what we need

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien3 Features from Audio Channel Phonetic segmentation into Phonemes Syllables Pitch Extraction

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien4 Speech Analysis Phonetic Segmentation Phonetic segmentation works quite good (forced alignment with HTK) Bootstrapping circle with Manual labelling of training data Training of HMMs Automatic alignment Manual correction of alignment results Re-training of HMMs Processed data: Typ 1 sentences (Ne kal ibam sud molen)

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien5 Speech Analysis Pitch Extraction Testing different pitch extraction methods from SFS (Mark Huckvale, UCL) Promising results with fine-tuning of parameters for contour- smoothing & high-pitch correction

Österreichisches Forschnungsinstitut für Artificial Intelligence Sample 06joi112

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien7 Features from Video Channel Face detection Silhouettes & Bounding Boxes Hand tracking

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien8 FaceDetection Using openCV Feature based face detection with pre-trained cascaded classifier (almost) out of the box Very good results under close-to- optimal conditions

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien9 FaceDetection Using openCV Feature based face detection with pre-trained cascaded classifier Clip:06joi112

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien10 Silhouettes & Bounding Boxes Combing silhouettes & bounding boxes in frontal and sideview as simple & robust estimator of Dynamics of movements Amount of expansion

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien11 Silhouettes & Bounding Boxes Combing silhouettes & bounding boxes in frontal and sideview as simple & robust estimator

Österreichisches Forschnungsinstitut für Artificial Intelligence 1 st Results: 3D Bounding Box Volume per Emotion – Actor 01

Österreichisches Forschnungsinstitut für Artificial Intelligence 1 st Results: 3D Bounding Box Volume per Emotion – Actor 07

Österreichisches Forschnungsinstitut für Artificial Intelligence 3D Bounding Box Volume: JOI vs. TRI / Actor 01 vs. Actor 06 ACTOR 01ACTOR 06

Österreichisches Forschnungsinstitut für Artificial Intelligence Temporal Dynamics: Plotting BoundingBox & Speech-Timing Clip:06joi112

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Find skin areas in 1 st frame

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Find skin areas in 1 st frame

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Find skin areas in 1 st frame Find hand & use center of area as hand position

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Find skin areas in 1 st frame Find hand & use center of area as hand position Interactively accept or correct

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Find skin areas in 1 st frame Find hand & use center of area as hand position Interactively accept or correct Perform automatic tracking (using mean shift algorithm)

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Perform automatic tracking (using mean shift algorithm)

Österreichisches Forschnungsinstitut für Artificial Intelligence Hand Tracking Perform automatic tracking (using mean shift algorithm) Interactively classify quality of tracking The GOOD (70%) The BAD (15%) The UGLY (15%)

Österreichisches Forschnungsinstitut für Artificial Intelligence Different Actors – Different Results

Österreichisches Forschnungsinstitut für Artificial Intelligence Different Actors – Different Results Difficult Easier

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien25 Discussion What have we gained? What do we need?

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien26 Investigations 1: Interaction Influence of Affect on Gesture AND speech Activation dimension should be reflected in speech and body movement in a parallel way. i.e. look at speed, effort etc.

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien27 Investigations 1: Interaction Influence of Affect on Gesture AND speech Activation dimension should be reflected in speech and body movement in a parallel way. i.e. look at speed, effort etc. Data already usable

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien28 Investigations 2: Timing Temporal aspects Look at the relative timing of speech and non-verbal signals, e.g. The location of strokes in relation to accented syllables

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien29 Timing: Traditional Anchor points in Speech Borders of Syllables Words (Prosodic and Syntactic) Phrases Utterances Turns Location of Pauses (Pitch) Accents

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien30 Timing: Traditional Anchor points in Speech Borders of Syllables +++ Words +++ (Prosodic & Syntactic) Phrases + - Utterances +? Turns -- Location of Pauses + (Pitch) Accents + --

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien31 Temporal alignment Traditional Anchor points in Gestures Prepare Stroke Hold Retract (Graphics by A. Marshall

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien32 Temporal alignment Traditional Anchor points in Gestures Phases difficult to obtain, but general information on dynamics available (Graphics by A. Marshall

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien33 Investigations 3: Bodily Expression of Emotion Relevant Features (cf Wallbott 1998): Upper Body: away from camera, collapsed Shoulders: up, backward, forward Head: down, back, turned or bent Arms: lateral, stretched out frontal/sideways, crossed

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien34 Investigations 3: Bodily Expression of Emotion Relevant Features (cf Wallbott 1998): Hand-form: fist(s), opening/closing Movement Qualities: Activity,Expansiveness, Dynamics/Energy/Power

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien35 Investigations 3: Bodily Expression of Emotion Relevant Features (cf Wallbott 1998): Data currently focussing on hand location. Upper-body and posture not directly assessed. Movement qualities partially accessible

Österreichisches Forschnungsinstitut für Artificial Intelligence What we DO have by Now

Österreichisches Forschnungsinstitut für Artificial Intelligence What we might WANT to have?

Österreichisches Forschnungsinstitut für Artificial Intelligence Possible Representations H-ANIM/MPEG-4-style joint angles

Österreichisches Forschnungsinstitut für Artificial Intelligence Possible Representations H-ANIM/MPEG-4-style joint angles Gesticon/MURML/HamNoSys- style wrist position in relation to body encoding of the stroke-phase etc.

Österreichisches Forschnungsinstitut für Artificial Intelligence Possible Representations Less ambitious Symmetric/Assymetric Extended/Collapsed Static/Dynamic … In any case: Need to relate pixel- numbers to anthropomorphic measures!

Österreichisches Forschnungsinstitut für Artificial Intelligence Possible Representations Also consider (manually supported) classification into prototypical classes

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien42 Summary Numerous possibilities for improvements, e.g., in hand tracking (but is it necessary?) Rather concentrating on representations in order to ensure data really is useful and re-usable (e.g. ECAs)

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien43 Summary Time to bundle expertises and distribute efforts within

Österreichisches Forschnungsinstitut für Artificial Intelligence © ÖFAI, Wien44 Thanks for your attention

Österreichisches Forschnungsinstitut für Artificial Intelligence

Österreichisches Forschnungsinstitut für Artificial Intelligence

Österreichisches Forschnungsinstitut für Artificial Intelligence Sample 06col112

Österreichisches Forschnungsinstitut für Artificial Intelligence Sample: 06peu111

Österreichisches Forschnungsinstitut für Artificial Intelligence

Österreichisches Forschnungsinstitut für Artificial Intelligence

Österreichisches Forschnungsinstitut für Artificial Intelligence

Österreichisches Forschnungsinstitut für Artificial Intelligence

Österreichisches Forschnungsinstitut für Artificial Intelligence