Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme.

Slides:



Advertisements
Similar presentations
Advanced XSLT. Branching in XSLT XSLT is functional programming –The program evaluates a function –The function transforms one structure into another.
Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
The Web Warrior Guide to Web Design Technologies
Substitute FAQs SubFinder Overview. FAQs Do I have to have touch-tone service to use SubFinder? No, but you do need a telephone that can be switched from.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Pace VoiceXML Absentee System Paul Visokey, Ping Gallivan, Yani Mulyani, Lisa Jordan, Elaine Li, George Mathew, Qisheng Hong Presenter Name : Paul Visokey.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
EGR 106 – Week 2 – Arrays & Scripts Brief review of last week Arrays: – Concept – Construction – Addressing Scripts and the editor Audio arrays Textbook.
Introduction to VXML. What is VXML? Voice Extensible Markup Language Used in telephone-based speech applications voice browsing of the web.
Introduction to XML This material is based heavily on the tutorial by the same name at
The Internet & The World Wide Web Notes
Creating Databases applications for the Web Reprise. Basic HTML review, forms Preview: Server side vs client side Classwork: create HTML forms and check.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
The audio will be turned on just before our start time at 7:00 pm ET.
JavaScript, Fifth Edition Chapter 1 Introduction to JavaScript.
Week 1.  Phillip Chee   Ext.1214 
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
XHTML Introductory1 Forms Chapter 7. XHTML Introductory2 Objectives In this chapter, you will: Study elements Learn about input fields Use the element.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
PHP meets MySQL.
1 Computational Linguistics Ling 200 Spring 2006.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Unit 2, cont. September 12 More HTML. Attributes Some tags are modifiable with attributes This changes the way a tag behaves Modifying a tag requires.
1 Labels and Tags October 14, Grammar A set of components and rules that define a method/means of communication among objects. Components are.
VoiceXML continued Speech reco/speech synthesis recap rps example ( ) Homework: Do VoiceXML examples. Start planning Project 2.
How do I use HTML and XML to present information?.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
XRules An XML Business Rules Language Introduction Copyright © Waleed Abdulla All rights reserved. August 2004.
DHTML AND JAVASCRIPT Genetic Computer School LESSON 5 INTRODUCTION JAVASCRIPT G H E F.
INTRODUCTORY Tutorial 1 Using HTML Tags to Create Web Pages.
CMPS 1371 Introduction to Computing for Engineers CONDITIONAL STATEMENTS.
ITCS373: Internet Technology Lecture 5: More HTML.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
What it is and how it works
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
Introduction to JavaScript CS101 Introduction to Computing.
Chapter 2: Variables, Functions, Objects, and Events JavaScript - Introductory.
ALBERT WAVERING BOBBY SENG. Welcome  Introductions  Existing knowledge?  Laptops?  Course goals  Introduction to several topics  Encourage creativity.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Creating Databases for Web applications Server side vs client side PHP basics Homework: Get your own versions of sending working: both html and Flash!
Student Pages
Creating User Interfaces Another example. Classwork/homework: work on VoiceXML project.
Creating User Interfaces Ideas & Trends Homework: Post constructive comments. Work on project.
Creating interfaces XML & XSL review VoiceXML: grammar Homework: postings, presentation, study guide.
Internet & World Wide Web How to Program, 5/e © by Pearson Education, Inc. All Rights Reserved.
Creating interfaces Multi-language example Definition of computer information system VoiceXML example Project proposal presentations Homework: Post proposal,
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.
Creating User Interfaces VoiceXML. Examples. Classwork/Homework: Make proposal and start work on your VoiceXML project.
Javascript Basic Concepts Presentation By: Er. Sunny Chanday Lecturer CSE/IT RBIENT.
Tutorial #1 Using HTML to Create Web Pages. HTML, XHTML, and CSS HTML – HyperText Markup Language The tags the browser uses to define the content of the.
HTML Tutorial. What is HTML HTML is a markup language for describing web documents (web pages) HTML documents are described by HTML tags Each HTML tag.
XML Schema – XSLT Week 8 Web site:
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Ashima Wadhwa Java Script And Forms. Introduction Forms: –One of the most common Web page elements used with JavaScript –Typical forms you may encounter.
VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF
Development Environment
Creating User Interfaces
Creating User Interfaces
An Introduction to JavaScript
Web Programming and Design
Presentation transcript:

Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme Studies. Study VoiceXML

Speech recognition User speaks. System 'understands', at least enough to perform some action. Related to (but not the same as) –Natural language understanding –Voice print identification –Record information to be re-played to human in compressed form for later interaction –Speech synthesis (other direction): words to speech –?

Natural language understanding Skip speech altogether, but type in statements or phrases in normal language –What is normal? We tend not to speak that grammatically –Many 'natural language systems' actually use keywords Histor Moon rocks example Combine speech to natural language …

Continuous versus discrete Speaker speaks 'naturally' versus Speaker separates words

Examples Dictation: no understanding as such, produce words/sentences in a program (Telephone) Help desk / Information: generally restricted or directed speech, choosing from alternatives (may or may not be given). Advances the process [Restricted] commands: actually carrying out operations –Factory example: start and stop –Car: radio, heat/AC –Phone: call specific number

Training Dictation application: user takes time to read specific test to train the system –Note: some systems also adapt with use. If & when user corrects the results, system may do better next time. Phone lookup: user records names. No 'understanding', just record for matching.

Audience & content Some systems may allow adapting to audiences, for example, male versus female Some systems have restrictions on types of content –Historical note: IBM system in 1980s & 1990s was restricted to male, American-born speakers (no speech impediments) and legal text.

Speech recognition concepts Air pressure  diaphragm in phone  electrical signal  (Fourier Transform)  wave pattern matched against sets of canonical patterns (native speaker of English, perhaps male/female & young/old alternatives) generated for the specified grammar (using a segmentation=dividing up of the parts) Note: interplay of grammar and statistics distinguishes different approaches

Fourier Transform (Discrete Fourier Transform -- FFT) Takes data representing a signal And produces numbers representing the combination of sine and cosine waves that make up the signal

Speech recognition Works on the product of the FFT Uses (in most cases) –Segmentation: attempt to break up into pieces, perhaps syllables or words –Grammar: definition of what is to be expected –Probabilities: if first part matched X, then greater probability that then next would match to Y

Current State of the Art General, no restrictions, speech reco, good enough to act on the speech? always about to happen? dictation / substitute for keyboard+ exists and satisfies many –Is this most important application for most users? –May not be killer ap, but may be good for motivating research Homework: prepare brief report on [a] current product or application. Can be one you use yourself.

Speech synthesis aka TTS (text to speech) Application determines that the computer needs to say certain words lexical units (syllables of words)  phonemes  pre-recorded (wav) files of phonemes

Speech synthesis This is again a segmentation process: need to divide up the words and then put together so speech sounds 'natural'. –particular phoneme may [need to] sound different in different context. –also need to deal with abbreviations & local accents –Place names (important in travel & weather applications) Special case: detect and use wav file for each name. Older methods were all synthesized –similar distinction between all synthesized and samples of music

Speech synthesis is essentially ‘the computer’ reading ‘out loud’. Easy to do most things More and more difficult to do complete job Different languages may be easier than English. People who are not monolingual please comment!

Restricted / directed speech applications We will use the tellme studio engine to create directed speech applications. These make use of –Grammars –Options to use numbers (buttons) –Recorded (.wav) sounds –Text to speech

studio.tellme.com Company that provides ‘engine’ for applications Provides developing environment –We are doing the Tellme version of VoiceXML, but it appears to be standard. Register as a developer: –Provide your own id; assigned a PIN –Put VoiceXML in ScratchPad place (no audio files) VXML (8965) –SAY id and then PIN or can give phone number. Tellme runs either program in ScratchPad OR program at Application URL for projects with multiple files To look at someone else's project, you change your Application URL –called pointing your account to a new source.

XML Generalization of HTML XML documents have markup. –Tag indicating type of element and, possibly with attributes, content, tag closer. Document must be well-formed. Developers decide on element types.

VoiceXML XML document (VXML header) –This means proper nesting of elements, quotation marks on attributes VoiceXML has tags for flow-of-control and calculations. –Also can use for JavaScript Grammars come in different varieties. We will use the Tellme way. –Grammars are included in CDATA tags to prevent XML interpretation. –Many grammars constructed for you. … will listen for yes or no. … will listen for currency. – for list

Very brief overview document contains and/or menu elements. – can contain, can contain or do its own audio can contain,,, etc. –NOTE: certain types of elements use built-in grammars, for example, boolean –Can have a child node that indicates what to do if there is a match – is a compressed way use a simple grammar

Very brief, cont. Logic can be done using a element that contains a variant of JavaScript and/or vxml logic elements, including – –, –other These may be part of a element

Audio Tellme studio provides way to record [your] speech as a wav file to upload to a website. Sends it to your address You upload your VoiceXML file plus any wav files (and anything else) Welcome to my site If Tellme can't find the mygreeting.wav file, it uses its Text to Speech on the string "Welcome to my site". Note: you also can use a full URL: You put in the URL for the voicexml file into your Tellme studio account, called pointing to the URL. TEST

VoiceXML basics, continued element can contain – elements, which can contain,, other – which can contain (if not one of built-in grammars) tags can be at different levels (for example, document, block, or higher levels) tags elements for JavaScript (which can also appear in expressions>

VoiceXML basics: typical case a form element –, made up of, with reference to recorded wav file and backup text, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section. with (follow-on) code using field for nomatch, noinput cases

Caution A form contains various elements, including a field. If a field has a grammar and the grammar is satisfied, control goes to a filled tag

obligatory… Hello, world recorded using tellme studio backup using TTS, just in case src file missing

example Asks for number of credits and calculates when you/caller can register uses built-in grammar for number No error recovery. You need to do better than this in your project. Unfortunate situation: there is a element type filled and an element type field. The < symbols are represented using lt;

<vxml version="2.1" xmlns=" Hello there. How many credits have you earned? <![CDATA[ NATURAL_NUMBER_THRU_999 ]]> Sorry. I didn't get that.

You can register on the third day You can register on the second day You can register on the first day You can register on the fourth day Good bye.

Homework Do research / think about your own experiences and come prepared to report on a speech recognition / speech synthesis application Start learning VoiceXML