Introduction and overview

Slides:



Advertisements
Similar presentations
INTEGRATION OF VOICE SERVICES IN INTERNET APPLICATIONS By Eduardo Carrillo (lecturer), J. J Samper, J.J. Martínez-Durá Universidad Autónoma de Bucaramanga.
Advertisements

Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Your Interactive Guide to the Digital World Discovering Computers Fundamentals, 2012 Edition.
Your Interactive Guide to the Digital World Discovering Computers Fundamentals, 2012 Edition.
Chapters 14 & 15 Internet Databases. E-Commerce  Bringing new products, services, or ideas to market, supporting and enhancing business operations 
XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.
 To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Discovering Computers: Chapter 1
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Pace VoiceXML Absentee System Paul Visokey, Ping Gallivan, Yani Mulyani, Lisa Jordan, Elaine Li, George Mathew, Qisheng Hong Presenter Name : Paul Visokey.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
Spoken Dialogue Technology How can Jerry Springer contribute to Computer Science Research Projects?
ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.
The Importance of the User Interface Lecture-1 The Essential Guide to UI Design: Chapter 1 1Computer Systems Interface.
1st Project Introduction to HTML.
Find The Better Way Expand Your Voice with VXML May 10 th, 2005.
Discovering Computers Chapter 1 Discovering Computers & Microsoft Office 2010.
Computer and Internet Basics.
MODULE 6 THE INTERNET. Introduction to the Internet and World Wide Web A computer network is a communication system that connects two or more computers.
Chapter ONE Introduction to HTML.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Systems Analysis and Design in a Changing World, 6th Edition
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
The Internet and World Wide Web.  Understand how the Internet evolved  Describe common Internet communication methods and activities  Setting up your.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)1 4. Speech Synthesis –Introduction to.
Chapter Lead Black Slide Powered by DeSiaMore Powered by DeSiaMore.
Introduction to Computers. Objectives Overview Describe the five components of a computer Discuss the advantages and disadvantages that users experience.
The speech technology business and evolution scenario 1 Silvia Mosso 1 22/11/2006 Multilinguism and Language Technology a Challenge for Europe workshop.
Conversational Applications Workshop Introduction Jim Larson.
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Media Resource Control Protocol v2 Sarvi Shanmugham, Editor: MRCP v1/v2.
Computer Concepts – Illustrated 8 th edition Unit A: Computer and Internet Basics.
CS117 Introduction to Computer Science II Lecture 1 Introduction to WWW and HTML Instructor: Li Ma Office: NBC 126 Phone: (713)
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Integrating VoiceXML with SIP services
1 David Thomson The Search for a Dialog Metalanguage that Makes Everybody Happy David Thomson Chair, VoiceXML Tools Committee, SpeechPhone CTO.
Speech Technology. HOT! What are the big players in the area up to? Google – technology.htmlhttp://googleblog.blogspot.com/2010/12/can-we-talk-better-speech-
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Introduction to web development and HTML MGMT 230 LAB.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Voice User Interface
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 5 Information System Software.
Basic Introduction to Computers
Introduction to Computational Linguistics
Multimedia and Computers Introduction to Computers.
Developing an Effective Wireless Middleware Strategy.
Introduction to Markup Languages January 31, 2002.
© 2013 by Larson Technical Services
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. An Overview of XML Ellen Pearlman Eileen Mullin Programming the Web Using.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
Presentation Title 1 1/27/2016 Lucent Technologies - Proprietary Voice Interface On Wireless Applications Protocol A PDA Implementation Sherif Abdou Qiru.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Living in a Digital World Discovering Computers Fundamentals, 2011 Edition.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Web Design Principles 5 th Edition Chapter 3 Writing HTML for the Modern Web.
A BRIEF HISTORY OF THE INTERNET, WEB, AND HTML. Internet vs. World Wide Web What is The Internet? The Internet is a massive network of networks, a networking.
Discovering Computers 2009 Chapter 1 Introduction to Computers.
Presented By Sharmin Sirajudeen S7 CS Reg No :
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
The Importance of the User Interface
Project 1 Introduction to HTML.
Objectives Overview Explain why computer literacy is vital to success in today's world Describe the five components of a computer Discuss the advantages.
Mentors: Christine Lisetti and Ugan Yasavur
Objectives Overview Explain why computer literacy is vital to success in today’s world Define the term, computer, and describe the relationship between.
Lesson 9: GUI HTML Editors and Mobile Web Sites
The Importance of the User Interface
Presentation transcript:

Introduction and overview

Outline A short history of the field Speech synthesis (TTS) Automatic speech recognition (ASR) Dialog system architectures Voice on the Web (perhaps show the Siri video) Voice on the Web and W3C Standards Relation to linguistic theory A brief look at the course plan What this course is not about What this course could mean to you Introduction to lab assignments and platforms Designing and developing spoken dialog systems Present project Give home assignment 1: Call flow design and evaluation Present Lab assignment 1)

A short history of the field 1966, Joseph Weizenbaum, Eliza Sundial ATIS Verbmobil AIML NLP system VXML "Voice XML", dialog markup language (primarily for telephony) developed initially by AT&T then administered by an industry consortium and finally a W3C specification. Voxeo, Tropo

Speech synthesis text speech

Speech recognition speech text (or some semantic representation)

Dialog management Finite-state based dialog management Frame based (form-based) dialog management Information-state based dialog management Plan based dialog management

Spoken dialogue system

Why voice Wireless devices have small screens and limited input capabilities. Telephone keypad can give users only a limited number of choices. Speech technology is improving. The exchange of information between a person and a computer is becoming more like a real conversation. Users want hands-free or eyes-free use. From a business viewpoint, voice applications open up a host of new revenue opportunities. There exist many more telephones than computers with the potential to access the Internet.

Traditional Interactive Voice Response (IVR)

Speech versus Touch Tone

Applications Information providing systems: Transaction-based systems: weather reports stock quotes timetables Transaction-based systems: calendar functions shopping financial transactions travel reservations

Architecture 1

Architecture 2

Components Natural language understanding dialog manager Proper Name identification part of speech tagging parser dialog manager output generator natural language generator gesture generator layout engine input recognizer/decoder automatic speech recognizer gesture recognizer handwriting recognizer output renderer text-to-speech engine talking head robot or avatar multi-modal fusion

Types of systems by modality by device by style by initiative text-based spoken dialog system graphical user interface multi-modal by device telephone-based systems PDA systems in-car systems robot systems desktop/laptop systems native in-browser systems in-virtual machine in-virtual environment robots by style command-based menu-driven natural language speech graffiti by initiative system initiative user initiative mixed initiative by application information service command-and-control entertainment education/tutorial edutainment reminder systems companion systems healthcare eldercare assistive/access systems

Mobile voice apps Voice on the Web http://www.youtube.com/watch?v=OURZpqh-35A&eurl=&feature=player_embedded

Relation to other fields Phonetics Phonology Syntax Semantics Pragmatics spoken language understanding psycholinguistics human communication discourse analysis human-computer interaction computational linguistics NL-parsing NL-generation language modeling multi-modal fusion multi-modal fission psychology cognitive science affective dialog user modeling embodied communication

A brief look at the course plan

What this course is not about Sophisticated dialog management Multi-modal systems Non-spoken dialog systems

What this course could mean to you Will prepare you for writing a thesis in the area of dialog systems (if you so choose) Will prepare you for work in the industry A link to the linkedin page

Is this something for a linguist?

Roles in the process Dialog designer VoiceXML programmer Voice talent Grammar writer TTS specialist Speech recognition specialist Quality assurance specialist  Server specialist Manager

Who are the big players in the area? Google http://googleblog.blogspot.com/2010/12/can-we-talk-better-speech-technology.html Microsoft http://gigaom.com/2010/12/06/microsoft-claims-its-place-in-a-voice-enabled-world/ Apple http://www.dailyfinance.com/story/company-news/apples-siri-purchase-heats-up-the-race-toward-a-voice-activated/19458344/ IBM http://www.ibm.com/news/in/en/2010/08/20/a896686u56875f96.html Nuance http://gigaom.com/2011/01/19/nuance-releases-mobile-sdk-to-speechify-apps/ Voxeo AT&T

What's Driving Speech as a Mobile Platform? The Emergence of Speech as a Mobile Platform Market Trends Speech-Enabled Mobile Apps Gaining Acceptance Voice Control in a Mission-Critical Environment Search Engine for Audio-Visual Content Instantaneous Language Translation IBM's Spoken Web What's Driving Speech as a Mobile Platform? Mobile Devices and Peripherals Cloud Computing Open Technologies Mashups and the Programmable Web Legislation Closing the (Mobile) Digital Divide An Overview of Emerging SAAP Applications Current Speech-Equipped Devices are Merely the Tip of the Iceberg SaaP Enables New Application Interaction Spoken Alerts Mobile Reminders Synthesized Speech Email and Text Messages Speech-to-Text for Voicemail SaaP Enables Voice User Interfaces Speech Recognition: The Foundation of Speech-Enabled Apps Constrained vs. Natural Language Processing Automated vs. Hybrid Speech Recognition Applications for Speech Recognition Speaker Authentication Email and Text Messages Composition Launch and Control Mobile Apps Special Case: Voice Activation

Call flow and call flow diagrams

Evaluating speech and dialog technology

W3C Speech Standards Torbjörn Lager

The big picture HTML Webbläsare Webbservrar

The place of speech technology … speech technology itself has a very long way to go. … the most important thing may turn out to be be not the speech technology itself, but the way in which speech technology connects to all the other technologies. Tim Berners-Lee

VoiceXML-browser (ASR, TTS) The big picture HTML HTML-browser Voice XML Webb-servers VoiceXML-browser (ASR, TTS)

The What and Why of Standards Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages. Advantages: developers can create applications using the standard languages that are portable across a variety of platforms; products from different vendors are able to interact with each other; a community of experts evolves around the standard and is available to develop products and services based on the standard. Disadvantages: some developers feel that standards may inhibit creativity and stall the introduction of superior technology. However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough. Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.

World Wide Web Consortium http://www.w3.org/

W3C Speech Standards Speech Recognition Grammar Specification (SRGS) – What the user can say Semantic Interpretation for Speech Recognition (SISR) – What the user means Speech Synthesis Markup Language (SSML) – What the user hears Pronunciation Lexicon Specification (PLS) – How words are pronounced

Intro to XML Standard for storage and transportation of data Maintained by W3C (w3.org/TR/REC-xml) Elements and tags Well-formedness Validity DTD Editor (Textmate + XMLmate)

Speech synthesis

Speech synthesis text lang speech voice persona

A peek inside the black box http://www.explainthatstuff.com/how-speech-synthesis-works.html