Multi-modal expression of Swedish prominence Björn Granström Centre for Speech Technology, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden.

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

Human language technologies Data Collections & Studies WP4- Emotions: Faces.
How has our knowledge expanded & been limited by gestural communication? Kacey Kuchenbecker Personal Experiences, why topic chosen.
Phonetics as a scientific study of speech
Nonverbal Communication Actions, as opposed to words, that send messages Body language, behavior Some messages are subtle, such as posture Can be so strong.
Chapter Eleven Delivering the Speech. Chapter Eleven Table of Contents zQualities of Effective Delivery zThe Functions of Nonverbal Communication in Delivery.
Nonverbal Communication 60% of our communication is NONverbal!
The Neuroscience of Language. What is language? What is it for? Rapid efficient communication – (as such, other kinds of communication might be called.
Multimodal Communication Jens Allwood Swecog National Swedish graduate school in cognitive science.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
On the interaction between intonational prominence and phrasing – evidence from Swedish Gösta Bruce Lund University Sweden.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Loredana Cerrato Looking for feedback expressions in speech corpora
Producing Emotional Speech Thanks to Gabriel Schubiner.
Understanding Non- Verbal Communication MRS. DOBBINS.
STUDY OF ENGLISH STRESS AND INTONATION
Verbal & Non-Verbal Communication Active & Passive Listening
What does your body say?.  all messages that are not expressed as words.
WHAT ARE ‘ESSENTIAL QUESTIONS’???? The main questions each class lesson aims to answer by the end of the class. They are the important themes or key points.
[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit.
Effective Communication Objectives:   Identify the components of effective communications   Organize information needed to complete a task   Compare.
Helsinki University of Technology Laboratory of Computational Engineering Modeling facial expressions for Finnish talking head Michael Frydrych, LCE,
Phonetics and Phonology
Nonverbal Communication
EXPRESSED EMOTIONS Monica Villatoro. Vocab to learn * Throughout the ppt the words will be bold and italicized*  Emotions  Facial Codes  Primary Affects.
NONVERBAL COMMUNICATION
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Lipreading: how it works. Learning objectives Recognise the different processes and skills involved in lipreading Revise factors that help or hinder lipreading.
Speech Perception 4/4/00.
Sign Language – 5 Parameters Sign Language has an internal structure. They can be broken down into smaller parts. These parts are called the PARAMETERS.
An evaluation of the prototype mobile phone app Pugh a 3D cartoon character designed to help deaf children to speech read Dawn Shaw Marianne Patera Eleni.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
Teaching Pronunciation. The articulation of consonants and vowels and the discrimination of minimal pairs had shifted Emphasis on suprasegmental features.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Feedback Elisabetta Bevacqua, Dirk Heylen,, Catherine Pelachaud, Isabella Poggi, Marc Schröder.
Soft Skills Unit. What Is Communication? Communication Transfer and understanding of meaning. Transfer means the message was received in a form that can.
Animated Speech Therapist for Individuals with Parkinson Disease Supported by the Coleman Institute for Cognitive Disabilities J. Yan, L. Ramig and R.
NONVERBAL COMMUNICATION What is non verbal communication? Nonverbal communication has been defined as communication without words.Nonverbal communication.
Nonverbal Communication TEKS Speech 1(b), 1(e), 1(j), 2(a)
Language in Cognitive Science. Research Areas for Language Computational models of speech production and perception Signal processing for speech analysis,
Non Verbal Communication. What Is Paralanguage? DEFINITION Paralanguage is the voice intonation that accompanies speech, including voice pitch, voice.
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.
Chapter 7. 1-Speaking from Manuscript – Pros- – Cons 2-Speaking from Memory (oratory) 3-Speaking Impromptu 4-Speaking Extemporaneously.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
Speech in the DHH Classroom A new perspective. Speech in the DHH Bilingual Classroom Important to look beyond the traditional view of speech Think of.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Functions of Intonation By Cristina Koch. Intonation “Intonation is the melody or music of a language. It refers to the way the voice rises and falls.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4.
Verbal And Non-Verbal Communication
Lecture Overview Prosodic features (suprasegmentals)
Functions of intonation 1
Communication Skills COMM 101 Lecture#2
Why Study Spoken Language?
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Why Study Spoken Language?
Graduate School of Language Technology
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,
Multimodal Caricatural Mirror
Presentation transcript:

Multi-modal expression of Swedish prominence Björn Granström Centre for Speech Technology, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden TT Centrum för talteknologi

Historical background Prosody for speech synthesis at KTH, together with Rolf Carlson The Lund intonation model – Gösta Bruce et al.

Several joint projects Profs – Prosodic phrasing in Swedish ~ Gösta Bruce, Björn Granström and more First reference: G. Bruce and B. Granström. Modelling Swedish intonation in a text-to-speech system. STL- QPSR, 30(1):17-21, (on the KTH web)

Potentially ambiguous sentences, varying in phrase boundary location

Entering greve Piper´s humble residence

Several joint projects, cont. Prosodiag - Prosodic Segmentation and Structuring of Dialogue (HSFR + NUTEK) 1993 –1996 Gösta Bruce, Björn Granström, Kjell Gustafson, David House, Paul Touati Project Description The object of study is the prosody of dialogue in a language technology framework. The primary goal of the project is to increase our understanding of how prosodic aspects of speech are exploited interactively in dialogue and on the basis of this increased knowledge to be able to create a more powerful prosody model. Late reference: Gösta Bruce, Johan Frid, Björn Granström, Kjell Gustafson, Merle Home, and David House. Prosodic segmentation and structuring of dialogue. TMH-QPSR, 37(3):1-6, More than 20 joint publications – and then?

Much in the context of the annual phonetics meetings – next:

Project meetings in inspirering surroundings

..probing many different cultures

Is prosody more than sound? Our bias: communication is multi-modal Traditionally prosodic functions are signaled by “gestures”, perceived by “eye and ear” This concerns both body and face gestures Preliminary hypothesis: F0~eyebrow height - e.g. Cavé et al. (1996) Easy to put to a test with multimodal speech synthesis

Eyebrow vs intonation “Jag heter Axel, inte Axell” (translation: “My name is Axel, not Axell”). In Sweden Axel is a first name as opposed to Axell, which is a family name. 1 No eyebrow motion 2 Eyebrow motion controlled by the fundamental frequency of the voice 3 Eyebrow motion at focal accents + 4 Eyebrow motion at the first focal accent +

Goals and research context How are visual expressions used to convey and strengthen prosodic functions? Understand interactions between visual expressions, dialog functions and speech acoustics Context: animated talking agent –Realistic communicative behavior using multimodal speech synthesis

Visual prosodic functions Prominence –stress –focus Phrasing Utterance type –question –statement Dialogue functions –back channeling –turntaking Attitudes Emotions

Visual prosody cont. What is underlying? How tight is the AV connection? What are the important visual gestures? More optional than acoustic prosodic parameters? Individual and cultural variation Reinforcing or qualifying acoustics?

Formal experiment Prominence due to eyebrow rise 5 content words: ”När pappa fiskar stör piper Putte” When dad is fishing sturgeon, Putte is whimpering

Example of stimuli Task: “which word is most prominent” (identical acoustics – varied location of eyebrow movement) No eyebrow movement (neutral) Eyebrow movement

Prominence increase due to eyebrow movement

Feedback experiment Mini dialogues (two turns) Travel agent application Both visual and acoustic feedback cues Affirmative cues – agent understands/accepts the request Negative cues – agent is unsure about the request (seeks confirmation) Six cues hypothesised Granström, House & Swerts (2002)

Pos/Neg feedback experiment (Granström, House & Swerts 2002)

Recording of communicative interactions Automatic tracking of reflective spots in 3D (Qualisys)

Interactions: emotion and articulation (resynthesis) (from AV speech database – EU/PF_STAR project)

Measurement points for lip coarticulation analysis Lateral distance Vertical distance left mouth corner

The expressive mouth All vowels (sentences) –Encouraging –Happy –Angry –Sad –Neutral ”left mouth corner” (Svanfeldt et al. 2003)

Prompted read speech database Expressive modes: –Confirming, questioning, certain, uncertain, happy, (angry) 39 short, content neutral sentences with three possible focal accent positions each, e.g. Båten seglade förbi (The boat sailed by) Dom flyttade möblerna (They moved the furniture) Nonsense words (VCV, VCCV, CVC) Digits

Mean eyebrow positions for one speaker

Nose marker traces with automatic (blue) and two human (red) annotated head nods (adapted from Cerrato & Svanfeldt 2006)

Examples from the database Confirming Happy Focal accent on: Båten seglade förbi

Exploitation of visual parameters Visual cues exploited at focal accent Mouth cues –Happy, encouraging Eyebrow cues –Happy, questioning Vertical head nods –Confirming

Analysis in terms of FAP and FMQ MPEG-4 Facial Animation Parameter (FAP) A subset of 31 FAPs out of the 68 FAPs defined in the MPEG-4 standard, including only the ones that we were able to calculate directly from our measured point data Focal Motion Quotient, FMQ, defined as the standard deviation of a FAP parameter taken over a word in focal position, divided by the average standard deviation of the same FAP in the same word in non-focal position.

The focal motion quotient, FMQ, averaged across all sentences, for all measured MPEG-4 FAPs for several expressive modes articulation I smile I brows I head

The effect of focus on the variation of several groups of MPG-4 /FAP parameters, for different expressive modes FMQ (Focal Motion Quotient)

The effect of focal accent on selected parameter variations in Certain and Uncertain readings FMQ (Focal Motion Quotient)

What´s next? Better recordings Detailed analysis of the eye region: ”Gaze and wrinkles” Use in applications, e.g. spoken dialogue systems And more audible prosody…….

New cooperative project SIMULEKT - Simulering av svenskans prosodiska dialekttyper (Simulating intonational varieties of Swedish) VR And finally………..

Congratulations! Well done Gösta!