Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction and overview

Similar presentations


Presentation on theme: "Introduction and overview"— Presentation transcript:

1 Introduction and overview

2 Outline A short history of the field Speech synthesis (TTS)
Automatic speech recognition (ASR) Dialog system architectures Voice on the Web (perhaps show the Siri video) Voice on the Web and W3C Standards Relation to linguistic theory A brief look at the course plan What this course is not about What this course could mean to you Introduction to lab assignments and platforms Designing and developing spoken dialog systems Present project Give home assignment 1: Call flow design and evaluation Present Lab assignment 1)

3 A short history of the field
1966, Joseph Weizenbaum, Eliza Sundial ATIS Verbmobil AIML NLP system VXML "Voice XML", dialog markup language (primarily for telephony) developed initially by AT&T then administered by an industry consortium and finally a W3C specification. Voxeo, Tropo

4 Speech synthesis text speech

5 Speech recognition speech text (or some semantic representation)

6 Dialog management Finite-state based dialog management
Frame based (form-based) dialog management Information-state based dialog management Plan based dialog management

7 Spoken dialogue system

8 Why voice Wireless devices have small screens and limited input capabilities. Telephone keypad can give users only a limited number of choices. Speech technology is improving. The exchange of information between a person and a computer is becoming more like a real conversation. Users want hands-free or eyes-free use. From a business viewpoint, voice applications open up a host of new revenue opportunities. There exist many more telephones than computers with the potential to access the Internet.

9 Traditional Interactive Voice Response (IVR)

10 Speech versus Touch Tone

11 Applications Information providing systems: Transaction-based systems:
weather reports stock quotes timetables Transaction-based systems: calendar functions shopping financial transactions travel reservations

12 Architecture 1

13 Architecture 2

14 Components Natural language understanding dialog manager
Proper Name identification part of speech tagging parser dialog manager output generator natural language generator gesture generator layout engine input recognizer/decoder automatic speech recognizer gesture recognizer handwriting recognizer output renderer text-to-speech engine talking head robot or avatar multi-modal fusion

15 Types of systems by modality by device by style by initiative
text-based spoken dialog system graphical user interface multi-modal by device telephone-based systems PDA systems in-car systems robot systems desktop/laptop systems native in-browser systems in-virtual machine in-virtual environment robots by style command-based menu-driven natural language speech graffiti by initiative system initiative user initiative mixed initiative by application information service command-and-control entertainment education/tutorial edutainment reminder systems companion systems healthcare eldercare assistive/access systems

16 Mobile voice apps Voice on the Web

17 Relation to other fields
Phonetics Phonology Syntax Semantics Pragmatics spoken language understanding psycholinguistics human communication discourse analysis human-computer interaction computational linguistics NL-parsing NL-generation language modeling multi-modal fusion multi-modal fission psychology cognitive science affective dialog user modeling embodied communication

18

19 A brief look at the course plan

20 What this course is not about
Sophisticated dialog management Multi-modal systems Non-spoken dialog systems

21 What this course could mean to you
Will prepare you for writing a thesis in the area of dialog systems (if you so choose) Will prepare you for work in the industry A link to the linkedin page

22 Is this something for a linguist?

23 Roles in the process Dialog designer VoiceXML programmer Voice talent
Grammar writer TTS specialist Speech recognition specialist Quality assurance specialist  Server specialist Manager

24 Who are the big players in the area?
Google Microsoft Apple IBM Nuance Voxeo AT&T

25 What's Driving Speech as a Mobile Platform?
The Emergence of Speech as a Mobile Platform Market Trends Speech-Enabled Mobile Apps Gaining Acceptance Voice Control in a Mission-Critical Environment Search Engine for Audio-Visual Content Instantaneous Language Translation IBM's Spoken Web What's Driving Speech as a Mobile Platform? Mobile Devices and Peripherals Cloud Computing Open Technologies Mashups and the Programmable Web Legislation Closing the (Mobile) Digital Divide An Overview of Emerging SAAP Applications Current Speech-Equipped Devices are Merely the Tip of the Iceberg SaaP Enables New Application Interaction Spoken Alerts Mobile Reminders Synthesized Speech and Text Messages Speech-to-Text for Voic SaaP Enables Voice User Interfaces Speech Recognition: The Foundation of Speech-Enabled Apps Constrained vs. Natural Language Processing Automated vs. Hybrid Speech Recognition Applications for Speech Recognition Speaker Authentication and Text Messages Composition Launch and Control Mobile Apps Special Case: Voice Activation

26 Call flow and call flow diagrams

27 Evaluating speech and dialog technology

28 W3C Speech Standards Torbjörn Lager

29 The big picture HTML Webbläsare Webbservrar

30 The place of speech technology
… speech technology itself has a very long way to go. … the most important thing may turn out to be be not the speech technology itself, but the way in which speech technology connects to all the other technologies. Tim Berners-Lee

31 VoiceXML-browser (ASR, TTS)
The big picture HTML HTML-browser Voice XML Webb-servers VoiceXML-browser (ASR, TTS)

32 The What and Why of Standards
Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages. Advantages: developers can create applications using the standard languages that are portable across a variety of platforms; products from different vendors are able to interact with each other; a community of experts evolves around the standard and is available to develop products and services based on the standard. Disadvantages: some developers feel that standards may inhibit creativity and stall the introduction of superior technology. However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough. Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.

33 World Wide Web Consortium

34 W3C Speech Standards Speech Recognition Grammar Specification (SRGS) –
What the user can say Semantic Interpretation for Speech Recognition (SISR) – What the user means Speech Synthesis Markup Language (SSML) – What the user hears Pronunciation Lexicon Specification (PLS) – How words are pronounced

35 Intro to XML Standard for storage and transportation of data
Maintained by W3C (w3.org/TR/REC-xml) Elements and tags Well-formedness Validity DTD Editor (Textmate + XMLmate)

36 Speech synthesis

37 Speech synthesis text lang speech voice persona

38 A peek inside the black box


Download ppt "Introduction and overview"

Similar presentations


Ads by Google