Forschungszentrum Telekommunikation Wien [Telecommunications Research Center Vienna] Interfaces between Speech and Non-Speech Audio Technology Michael.

Slides:



Advertisements
Similar presentations
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Advertisements

Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Rob Marchand Genesys Telecommunications
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents.
SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, May 2006.
XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.
ACCESSIBLE TECHNOLOGIES FOR SPEECH MANAGEMENT “Making media accessible to all” ITU workshop – Geneva October 2013.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.
Thomas Kisner.  Unified Communications Architect at BNSF Railway  Board Member, DFW Unified Communications User Group ◦ Meets 4 th Thursday of Every.
Auditory User Interfaces
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Chapter 12 Designing the Inputs and User Interface.
Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
The speech technology business and evolution scenario 1 Silvia Mosso 1 22/11/2006 Multilinguism and Language Technology a Challenge for Europe workshop.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Conversational Applications Workshop Introduction Jim Larson.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Creating Speaking Web Pages: The Text-to-Speech Integrated Development Environment (TTS-IDE) David C. Gibbs Department of Mathematics and Computing University.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
CMPD 434 MULTIMEDIA AUTHORING Chapter 06 Multimedia Authoring Process IV.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Modal Interfaces & Speech User Interfaces Katherine Everitt CSE 490F Section Nov 20 & 21, 2006.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Forschungszentrum Telekommunikation Wien An initiative of the K plus Programme Multimodal applications for mobile devices in Java Michael Pucher (FTW Vienna)
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
November 4th, 1996ICAD Industry Panel1 Audio Taken Seriously; The present and future of audio at Microsoft Ken Greenebaum Internet.
Microsoft Assistive Technology Products Brought to you by... Jill Hartman.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
 Visual: Reading and studying charts, drawings and graphic information  Auditory: Listening to lectures and audiotapes  Kinesthetic:  Demonstrations.
Introduction to Computational Linguistics
Distributed Rendering Tool for Voices (DRTV) Familiar, Expressive Voices & Personalities Speech Technology & Media Solutions By Dale Schalow SCHALOW Innovations.
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
© 2013 by Larson Technical Services
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.
PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
© 2013 by Larson Technical Services
Intelligent MultiMedia Storytelling System (IMSS) - Automatic Generation of Animation From Natural Language Input By Eunice Ma Supervisor: Prof. Paul Mc.
Presentation Title 1 1/27/2016 Lucent Technologies - Proprietary Voice Interface On Wireless Applications Protocol A PDA Implementation Sherif Abdou Qiru.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Forschungszentrum Telekommunikation Wien An initiative of the K plus Programme MONA Mobile Multimodal Next Generation Applications Rudolf Pailer
Presented By Sharmin Sirajudeen S7 CS Reg No :
How can speech technology be used to help people with disabilities?
Dialog Design 4 Speech & Natural Language
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
Multimodal Human-Computer Interaction New Interaction Techniques 22. 1
Demystifying Web Content Accessibility Guidelines
Presentation transcript:

Forschungszentrum Telekommunikation Wien [Telecommunications Research Center Vienna] Interfaces between Speech and Non-Speech Audio Technology Michael Pucher (FTW Vienna, ICSI Berkeley)

© ftw Contents  Text-to-Speech Synthesis (TTS)  Automatic Speech Recognition (ASR, STT)  Dialog Systems  Multimodal Mobile Applications  Resources

© ftw Auditory representations Affective states and attitudes Speaker characteristics Structural prosodic elements Pragmatics and discourse Sound signals Music Perspectival, spatial cues Non-linguistic Paralinguistic Linguistic Lexical semantics and syntax TTS ASR Dialog Systems

© ftw TTS Examples  16kHz natural voice  16kHz unit selection synthesis (server-based)  8kHz diphone-based synthesis with lexicon (embedded or distributed)  8kHz diphone-based synthesis without lexicon (embedded)  Application specific lexicon tSE-r6ld a:R fo:rd -Gerald R. Ford  tSE-r6ld a:R fo:rd

© ftw TTS Evaluation

© ftw TTS and Non-Speech Audio TTSFEATURESTATUS COMBINATION WITH NON- SPEECH AUDIO Comprehensible TTS Low word- error-rate Solved Diphone based TTS TTS provides lexical information Add structural prosodic elements Natural TTS Single style prosody Solved Unit selection TTS provides structural prosodic elements Add affective states and attitudes Expressive TTS Various prosodic styles Not solved ? Add pragmatic information, dialog turns

© ftw Limited Expressiveness of Speech 1  Limited expressiveness of Expressive TTS = Limited expressiveness of speech  Limited expressiveness of speech because of unlimited expressiveness 1 of speech -Because everything is expressible in language, the messages are less useful for certain purposes (too complex) -Simpler, less expressive codes (sounds, icons) may be used in context and lead to shorter messages  Disadvantages of speech -Seriality -Non-universality

© ftw Types of ASR and Applications  Isolated word recogniton  Large vocabulary Speech recognition  Conversational Speech recognition  Speech Recognition in noisy environments Car navigation Meeting transcription Command & control Broadcast news transcription Speaker dependent or speaker independent

© ftw Other Related Technologies  Speech -Speaker verification  NLP -Dialog act detection -Topic detection

© ftw Music Information Retrieval (MIR)  Query By Humming (Fraunhofer) -Non-speech sound as an input pattern to search for other non-speech sounds -  Performer Style Identification  Melody and Rhythm Extraction  Music Similarity  Genre Classification

© ftw Dialog Systems - ASR  3 Types of Recognition in state- of-the-art Dialog Systems -Isolated word -Recognition grammar -Statistical Language Model (SLM) + grammar for more robustness move forward backward exit quit „um ah to san francisco from new york“ 1. Apply SLM 2. Apply grammar on results of SLM

© ftw Dialog Systems – TTS and Audio  Loquendo TTS Mixer -Play and mix TTS and audio files -Fadein, fadeout -Pause and resume -Record Paolo Massimino : Loquendo S.p.A. From Marked Text to Mixed Speech and Sound

© ftw Dialog Management 1  Usages of non-speech audio -Replace prompts -Indicate dialog turns and dialog states -Indicate menu structure (3Daudio) -Create listen & feel of the application -System response time  Questions -Bargein, Streaming and Standardization

© ftw Dialog Management 2  A good bad example -Uses only speech -Audio enhancement for transitions -Audio enhancement for states Bob Cooper : Avaya Corporation A Case Study on the planned and actual Use of Auditory Feedback and Audio Cues in the Realization of a Personal Virtual Assistant

© ftw Dialog Managment 3  VoiceXML Version 2.0 -W3C (Word Wide Web Consortium) standard for voice dialog design -Form filling paradigm similar to web forms  Synthesis Markup Language (SSML) Version 1.0 good morning Any female voice here. A female child voice here.

© ftw Limited Expressiveness of Speech 2  Limited expressiveness of human-machine voice dialog compared to a natural dialog -Natural dialog is probable multimodal -Role of non-speech sound in human communication

© ftw The Importance of Multimodality for Mobile Applications  Multimodal communication is perceived as natural  Disadvantages of unimodal interfaces for mobile devices -Small displays -No comfortable alphanumeric keyboards -Visual access to the display is not always possible  Disadvantages cannot be overcome by increasing processor and memory capabilities

© ftw Multimodal Dialog Managment  Speech Application Language Tags (  Possible combination with non-speech audio at all states and transitions  Similar to (unimodal) dialog systems Minhua Ma : University of Ulster Paul Mc Kevitt : University of Ulster Lexical Semantics and Auditory Display in Virtual Storytelling

© ftw Asymmetric Multimodality  For Multiparty applications -Users select preferred modalities (e.g. speech, visual, music?) -System is able to translate content from one modality to another  MONA – Mobile Multimodal Next Generation Applications -Multiuser quiz application InputOutput Preference=SpeechOutput Preference=Visual Speech Speech-To-Text TextText-To-SpeechText

© ftw Resources  TTS -Festival 2.0, to build unit selection voices -Festival Lite, for embedded TTS -FreeTTS, a Java speech synthesizer -The Mbrola project, many synthetic voices available  ASR -Sphinx -Htk  Multimodal Systems -SALT implementations

© ftw Thank you for your attention Contact: