MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)1 4. Speech Synthesis –Introduction to.

Slides:



Advertisements
Similar presentations
CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.
Advertisements

Natural Language Systems
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Discovering Computers: Chapter 1
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.
Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,
Find The Better Way Expand Your Voice with VXML May 10 th, 2005.
THE BASICS OF THE WEB Davison Web Design. Introduction to the Web Main Ideas The Internet is a worldwide network of hardware. The World Wide Web is part.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Course: Introduction to Computers
Software and Multimedia
How the World Wide Web Works
Module 3 Productivity Programs Common Features and Commands Microsoft Office 2007.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Computer for Health Sciences
The Internet as a Publishing Channel Teppo Räisänen LIIKE/OAMK.
Section 2.1 Compare the Internet and the Web Identify Web browser components Compare Web sites and Web pages Describe types of Web sites Section 2.2 Identify.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
CHAPTER 2 Communications, Networks, the Internet, and the World Wide Web.
CS 0004 –Lecture 1 Wednesday, Jan 5 th, 2011 Roxana Gheorghiu.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Information Retrieval using Intelligent Speech Communication Interface Institute of Informatics of the Slovak Academy of Sciences, Bratislava
COMPUTER PROGRAMMING Source: Computing Concepts (the I-series) by Haag, Cummings, and Rhea, McGraw-Hill/Irwin, 2002.
Introduction To Computer System
Spoken Dialogue Systems and the GALAXY Architecture 29 October 2000 Advanced Technology Laboratories 1 Federal Street A&E Building 2W Camden, New Jersey.
1 Web Basics Section 1.1 Compare the Internet and the Web Compare Web sites and Web pages Identify Web browser components Describe types of Web sites Section.
The Internet and the World Wide Web Renee Roland, Dan Waters, Amelia Wright.
11.10 Human Computer Interface www. ICT-Teacher.com.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Integrating VoiceXML with SIP services
 The World Wide Web is a collection of electronic documents linked together like a spider web.  These documents are stored on computers called servers.
MULTIMEDIA DEFINITION OF MULTIMEDIA
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Voice User Interface
16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
1 User Interface Design Components Chapter Key Definitions The navigation mechanism provides the way for users to tell the system what to do The.
Introduction to Computational Linguistics
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
 Network  A _____ of computers that can _________ w/ each other  Examples of hardware  ______________ & communication lines  Internet  Hardware.
Web Server.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Presentation Title 1 1/27/2016 Lucent Technologies - Proprietary Voice Interface On Wireless Applications Protocol A PDA Implementation Sherif Abdou Qiru.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Language Technologies Capability Demonstration Alon Lavie, Lori Levin, Alex Waibel Language Technologies Institute Carnegie Mellon University CATANAL Planning.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Introduction  Program: Set of sequence instruction that tell the computer what to do.  Software: A collection of programs, data, and information. 
introductionwhyexamples What is a Web site? A web site is: a presentation tool; a way to communicate; a learning tool; a teaching tool; a marketing important.
Web Design Vocabulary #3. HTML Hypertext Markup Language - The coding scheme used to format text for use on the World Wide Web.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Discovering Computers 2009 Chapter 1 Introduction to Computers.
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
11.10 Human Computer Interface
Computer Technology Notes #3
PhoNET Voice based web access ASWIN.P S3 EC ROLL : 24.
Optimizing Multimodal Interfaces for Speech Systems in the Automobile
Presentation transcript:

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)1 4. Speech Synthesis –Introduction to Speech synthesis –TTS –Unit Selection –Animated Characters that speak

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)2 Introduction to Speech synthesis  The term Speech synthesis refers to the technologies that enable computers or other electronic systems to output simulated human speech.  Important are: intelligibility and naturalness.  Naturalness is often evaluated depending on every situation.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)3 TTS  Text – to – Speech  Translates text into speech using phonetic rules to transcribe the text and then speak it.  Requires information about: Abbreviations (Dr., Nr., etc., …) Specific readings of numbers and symbols ($, #, …) Reading of time formats (1:45, 13;45…) Pronunciation of each letter in every context (“cat”, “tar”, “Jane”,…)

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)4 Types of Speech Synthesis  Concatenative Based on human speech samples  Diphones  Words  Variable length units  Formant Synthesis Simulates human speech electronically using phonological rules

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)5 Animated Characters that speak  Baldi & Ms Gurney  Tongue models  Vismes Expressions: fear, anger, happiness,… Visible speech: vowels, consonants  Real-time animation by moving from one target position to the next

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)6

7 Studying Expressions (Viseme A)

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)8 3D Model (neutral face)

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)9 3D Model (angry)

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)10 5. Natural Language –Introduction to Natural Language Understanding (NLU) –Examples of structured queries and natural language –Parsing –Natural Language Generation (NLG) –Interface Design Challenges by Candence Kamm, AT&T Labs

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)11 Introduction to Natural Language Understanding (NLU)

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)12 Natural Language (NL)  This technology that has found its way into current applications is used for both, input and output.  Because of the multi-disciplinary community involved in speech user interface development, the term Natural language has several connotations.  Linguistic: it’s the language spoken and written by a given culture (English, German,…)  It can be discussed in terms of phonology, grammars, semantics and pragmatics.  From the standpoint of interaction, we focus and study pragmatics. We study how people use language and how this can be used to control computers. We also want a computer to respond consistently under different circumstances.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)13 Natural Language  It is common to use special languages (artificial languages) to solve specific problems, for example: mathematics, music, chemistry, computer languages.  Command languages are specialized to control a computer. Example:  MS-DOS commands: dir, del, …  UNIX console commands: ls, rm,…  In natural language a user would say: “list all the files in this directory” or “delete the file …”

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)14 Specialized Usage of NL  The goal is to achieve continuous natural language interaction with machines, however for now it is possible to apply it only for more specialized uses:  Natural Language Database Queries Text database queries are increasingly popular and commercially available They analyze natural language requests grammatically and apply additional linguistic analysis and business rules to create a request ans return results.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)15 Examples of queries  A structured query language may be faster than natural language queries but require the user to be trained to use it:  Altavista ( is a search engine: In Altavista you can type in a question in natural language: what italian resaturants are in seattle?  Yahoo ( is a directory: In Yahoo you would type: italian and restaurant and seattle

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)16 Parsing  This is a technology that analyzes a text  It works Similar to word-spotting  It scans the text and maps each word with a grammar to find the important keywords  Advanced systems will search for synonyms, corrects mistakes between singular and plural usage, etc.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)17 Natural Language Generation  Natural language is also used to generate understandable output for speech synthesis and for text output.  Medical and Legal reporting, generation of weather reports.  However, although it seems like the natural way to interact, it is not proven that it is optimal. What is faster, and more accurate: typed NL, spoken NL or GUI’s?

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)18  Nowadays it appears that in some situations spoken or written NL is better, with a reduced vocabulary, but it will require users to know the limitations  A multi-modal approach is likely to be the best solution in many settings, where the user can decide the kind of interaction (mouse, speech or typing), and where these complement each other.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)19 Interface Design Challenges by Candence Kamm, AT&T Labs  People don’t wait their turn in an interaction  People use different words to reach the same goal (“yes”, “ok”, “sure”, “yep”…)  The interfaces need to handle the limitations of the technology, work around it or warn the user of what could happen, etc.  Newer technology pose new challenges (wireless telephones, etc.)

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)20  The problem is to teach people what words the system knows, the domain it can handle.  Collecting data on the exact utterances people are going to say is very important.  The systems require very heavy error correcting, error recovery capabilities.  If you want an application to work today, you have to focus it as narrowly as possible and try to guide the user trough it.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)21 Speech Portals Introduction Voice eXtensible Markup Language VXML VUI –Voice User Interface Web-GUI vs. VUI

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)22 Speech Portals  Developed to provide information (Banks, Companies, Events…)  These can be interactive or not  Interactive portals can be programmed via various technologies, one of them is VXML.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)23 VXML  Voice eXtensible Markup Language or VXML is based on XML, for developing speech interfaces.  Users access VXML by dialing the phone number of the application.  A VXML Gateway accesses the internet to retrieve the web-page associated to that number and interprets it.  The Gateway manages the interaction using ASR, Speech Synthesis and is the link between the telephony service and the internet.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)24 VXML Network Interactions VXML GatewayWeb Server Telephony (PSTN or ISDN) Internet Device Transport

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)25 The voice Portal Components Voice Portal Audio Resource Telephony Resource ASR Resource TTS Resource TCP/IP Resource VoiceXML Browser/Interpreter

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)26 VXML enabling Network Voice Portal Web Servers Internet Telephony Network TCP/IP

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)27 VUI –Voice User Interface  A user interface is the part that a user interacts with  At basic level a VUI should Provide users with mental models of how the application works and what task it supports Collect user input in the form of spoken input or DTMF sounds (telephone keypad) Deliver system output to the telephone receiver Support users in task completion Support recovery from user or system errors

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)28 VUI User Characteristics  Voice applications should be targeted to well-defined user groups, but some common characteristics of VUI users help to understand the differences to PC GUI users: Limited PC/Internet experience Mobile environment Single I/O mode

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)29 Web-GUI vs. VUI WEB  Back Button  Home Button  Home link on the web-page  Screen layout with colors, theme graphics  Pop-up windows to indicate errors and recovery  Help link  Links to other web-pages  Form-input, Selection Lists, radio buttons  In-progress/status indicator

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)30 Web-GUI vs. VUI VUI  Voice commands to return to previous step  Voice commands to take the user to a known starting point (main menu)  Recorded announcements and TTS voices, speaking style, gender and tone branding.  Tones, TTS, or recorded messages to indicate errors and recovery  Help messages  System functions programmed to prompt input  VoiceXML forms associated with prompts  Audio-hourglass tone, sound, music or message that indicates what the system is doing.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)31 7. Spoken Dialogue Systems –Introduction –Basic components –CU Communicator –Galaxy Architecture

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)32 Basic components Speech Synthesis Parser Dialogue Management Natural Language Generation Speech Recognition Application Database

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)33 CU Communicator CSLR Colorado University  Financed by DARPA (beginning Abril´99)  GALAXY Architecture  Interface via telephone line  Reservation of flights, hotels and car-rental  Robust parsing and event-driven dialogue management.

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)34 Galaxy Architecture NLG Speech Recognition Speech Synthesis NLP Dialogue Management Audio Reliability Server Database HUB

MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)35 The End “By becoming a user you will be able to understand the tasks and appreciate the constraints involved”