Creating User Interfaces

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
CHAPTER 1: AN OVERVIEW OF COMPUTERS AND LOGIC. Objectives 2  Understand computer components and operations  Describe the steps involved in the programming.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Running Records.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
Introduction to VXML. What is VXML? Voice Extensible Markup Language Used in telephone-based speech applications voice browsing of the web.
Introduction to XML This material is based heavily on the tutorial by the same name at
MECHANICS OF WRITING C.RAGHAVA RAO.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Programming Games Basic HTML5 audio example. Catch-up. Work on basic video. Homework: Complete basic video.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
General Programming Introduction to Computing Science and Programming I.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
VoiceXML Brandon Hannasch. Outline What is VoiceXML? Basic Tags Voice Recognition Audio Files Call Flow.
1 Computational Linguistics Ling 200 Spring 2006.
New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com.
VoiceXML continued Speech reco/speech synthesis recap rps example ( ) Homework: Do VoiceXML examples. Start planning Project 2.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
DHTML AND JAVASCRIPT Genetic Computer School LESSON 5 INTRODUCTION JAVASCRIPT G H E F.
ITCS373: Internet Technology Lecture 5: More HTML.
Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme.
by Maria Rita Marruganti DIFFERENT WAYS OF SENDING INFORMATION Passive e.g. newspapers, radio, television. You don’t produce, just receive information.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
What it is and how it works
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
Introduction to JavaScript CS101 Introduction to Computing.
© 2013 by Larson Technical Services
Student Pages
Creating User Interfaces Another example. Classwork/homework: work on VoiceXML project.
HTML Overview Part 5 – JavaScript 1. Scripts 2  Scripts are used to add dynamic content to a web page.  Scripts consist of a list of commands that execute.
Creating User Interfaces Ideas & Trends Homework: Post constructive comments. Work on project.
Creating interfaces Multi-language example Definition of computer information system VoiceXML example Project proposal presentations Homework: Post proposal,
Creating User Interfaces VoiceXML. Examples. Classwork/Homework: Make proposal and start work on your VoiceXML project.
HTML Tutorial. What is HTML HTML is a markup language for describing web documents (web pages) HTML documents are described by HTML tags Each HTML tag.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Presented By Sharmin Sirajudeen S7 CS Reg No :
IIS for Speech Processing Michael J. Watts
HTML Structure & syntax
VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF
Speech Recognition
Development Environment
Introduction to Computing Science and Programming I
Learning to Program D is for Digital.
The Desktop Screen image displayed when a PC starts up A metaphor
Online Testing System Assessment Viewing Application (AVA)
                      Digital Audio 1.
Dialog Design 4 Speech & Natural Language
Digital Design – Copyright Law
The Five Stages of Writing
Online Testing System Assessment Viewing Application (AVA)
Readers’ Theater Link spiral.
Creating your first C program
Programming games Classes and objects (used for Jigsaw, Bouncing stuff, other projects) Homework: Complete cannonball. Video or Audio. Your own project.
AIRWays Benchmark Previewing System
Internet Resources for Teaching Pronunciation
A single person speaking alone, with or without an audience.
Sound, language, thought and sense integration
Creating User Interfaces
Demystifying Web Content Accessibility Guidelines
Introduction to Python programming for KS3
Section 11.1: Significance Tests: Basics
Web Client Side Technologies Raneem Qaddoura
CMPT 120 Lecture 3 - Introduction to Computing Science – Programming language, Variables, Strings, Lists and Modules.
Digital Audio Application of Digital Audio - Selected Examples
Presentation transcript:

Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

Speech recognition Encompasses variety and range of activities Totally open-ended to content and audience May claim more than really exists Restricted to small[er] set of phrases Phrases within longer sections of speech Restricted to require training OR system learns Dictation systems learn your voice

Speech recognition User speaks. System 'understands', at least enough to perform some action. Related to (but not the same as) Natural language understanding Voice print identification Record information to be re-played to human in compressed form for later interaction Speech synthesis (other direction): words to speech ?

Natural language understanding Skip speech altogether, but type in statements or phrases in normal language What is normal? We tend not to speak that grammatically Many 'natural language systems' actually use keywords Histor Moon rocks example Combine speech to natural language …

Continuous versus discrete Speaker speaks 'naturally' versus Speaker separates words

Examples Dictation: no understanding as such, produce words/sentences in a program (Telephone) Help desk / Information: generally restricted or directed speech, choosing from alternatives (may or may not be given). Advances the process [Restricted] commands: actually carrying out operations Factory example: start and stop Car: radio, heat/AC Phone: call specific number

Training Dictation application: user takes time to read specific test to train the system Note: some systems also adapt with use. If & when user corrects the results, system may do better next time. Phone lookup: user records names. No 'understanding', just record for matching.

Audience & content Some systems may allow adapting to audiences, for example, male versus female Some systems have restrictions on types of content Historical note: IBM system in 1980s & 1990s was restricted to male, American-born speakers (no speech impediments) and legal text.

Speech recognition concepts Air pressure  diaphragm in phone electrical signal  (Fourier Transform) wave pattern matched against sets of canonical patterns (native speaker of English, perhaps male/female & young/old alternatives) generated for the specified grammar (using a segmentation=dividing up of the parts) Note: interplay of grammar and statistics distinguishes different approaches

Fourier Transform (Fast Fourier Transform -- FFT) Takes data representing a signal And produces numbers representing the combination of sine and cosine waves that make up the signal

Speech recognition Works on the product of the FFT Uses (in most cases) Segmentation: attempt to break up into pieces, perhaps syllables or words Grammar: definition of what is to be expected Probabilities: if first part matched X, then greater probability that then next would match to Y

Current State of the Art General, no restrictions, speech reco, good enough to act on the speech? always about to happen? dictation / substitute for keyboard+ exists and satisfies many Is this most important application for most users? May not be killer ap, but may be good for motivating research Extra credit posting: prepare brief report on [a] current product or application. Can be one you use yourself.

Speech synthesis aka TTS (text to speech) Application determines that the computer needs to say certain words lexical units (syllables of words) phonemes pre-recorded (wav) files of phonemes

Speech synthesis This is again a segmentation process: need to divide up the words and then put together so speech sounds 'natural'. particular phoneme may [need to] sound different in different context. also need to deal with abbreviations & local accents Place names (important in travel & weather applications) Special case: detect and use wav file for each name. Older methods were all synthesized similar distinction between all synthesized and samples of music Phonetic method of reading: sounding out.

Speech synthesis is essentially ‘the computer’ reading ‘out loud’. Easy to do most things More and more difficult to do complete job Different languages may be easier than English. People who are not monolingual please comment!

Restricted / directed speech applications The language is VoiceXML We will use evolution.voxeo.com to create directed speech applications. Free facilty: put in URL pointing to a VoiceXML document. Supplies phone numbers to call in to test. You need to register. Note: previously used Tellme studios but they stopped offering service.

XML Generalization of HTML XML documents have markup. Tag indicating type of element and, possibly with attributes, content, tag closer. Document must be well-formed. Elements nested in other elements Quotation marks around attribute values Developers decide on element types. So, we need to obey rules of VoiceXML Each element type can only have certain child elements

Notes on VoiceXML There are field and filled elements! You can start and have text-to-speech as backup and, when appropriate and possible, make wav recordings. You can open file directly or in Voxeo and make check for well-formed XML. But this doesn't check for legal VoiceXML You can include JavaScript in file or as external script. Can put in pauses, other tricks to improve SR and TTS.

<. xml version="1. 0" encoding="UTF-8". > <vxml version = "2 <?xml version="1.0" encoding="UTF-8"?> <vxml version = "2.1"> <meta name="Jeanine" content="jeanine.meyer@purchase.edu"/> <meta name="speak_exceptions" content="true" /> <form> <block> <prompt> Hello World. This is my first Voxeo application. </prompt> </block> </form> </vxml>

My modification of the SouthPark example: outline <?xml version="1.0" encoding="UTF-8"?> <vxml version = "2.1" xmlns=http://www.w3.org/2001/vxml> <meta name="jeanine.meyer" content="jeanine.meyer@purchase.edu"/> <form id="MainMenu"> <field name="DowntonCharacter"> … </field> <filled namelist="DownCharacter">… </filled> <form> </vxml>

<field name="DowntonCharacter"> <prompt> Please say your favorite Downton Abbey character's name. </prompt> <!-- Define the grammar. --> <grammar xml:lang="en-US" root = "myrule"> <rule id="myrule"> <one-of> <item> Carson </item> <item> Mrs. Hughes </item> … <item> Mary </item> <item> Cora </item> </one-of> </rule> </grammar>

<. -- The user was silent, restart the field <!-- The user was silent, restart the field. --> <noinput> I did not hear anything. Please try again. <reprompt/> </noinput> <!-- The user said something that was not defined in our grammar. --> <nomatch> I did not recognize that character. Please try again. </nomatch> </field>

<filled namelist="DowntonCharacter"> <if cond="DowntonCharacter == 'Carson'"> <prompt> Carson grew less likeable as the seasons went on. </prompt> <elseif cond="DowntonCharacter == 'Mrs. Hughes'"/> Mrs. Hughes is wise, so why did she marry Carson? </prompt> … <else/> A match has occurred, but we have no specific response prepared. Perhaps you liked Mary or Cora. </if> <goto next="#MainMenu"/> </filled>

Notes The list in the field has names not referenced in the field element, such as Mary and Cora. If it doesn't work AND you have checked it is well-formatted and after you start to use other elements, check the element documentation to check that you are putting elements within allowed elements Consider using file manager to upload to their storage (www). May give more reliable results.

Screen shot from Voxeo

Screen shot: phone numbers

My examples Family greeting: built in audio files, use of calculations in VoiceXML to determine number of cranes to be done Rock paper scissors: JavaScript code to determine random move for computer, VoiceXML variables, break for timing, count and timeout with prompt ?

Homework (over break) Sign up to be Voxeo developer. Start VoiceXML tutorials: http://help.voxeo.com/go/help/xml.vxml.tutorials.overview Do your own hello, world application. Do a second application, involving some speech recognition. Do more? Check out http://help.voxeo.com/go/help/xml.vxml.elements.overview Start planning your VoiceXML project.