Download presentation
Presentation is loading. Please wait.
Published byHenry McGee Modified over 9 years ago
1
Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme Studies. Study VoiceXML
2
Speech recognition User speaks. System 'understands', at least enough to perform some action. Related to (but not the same as) –Natural language understanding –Voice print identification –Record information to be re-played to human in compressed form for later interaction –Speech synthesis (other direction): words to speech –?
3
Natural language understanding Skip speech altogether, but type in statements or phrases in normal language –What is normal? We tend not to speak that grammatically –Many 'natural language systems' actually use keywords Histor Moon rocks example Combine speech to natural language …
4
Continuous versus discrete Speaker speaks 'naturally' versus Speaker separates words
5
Examples Dictation: no understanding as such, produce words/sentences in a program (Telephone) Help desk / Information: generally restricted or directed speech, choosing from alternatives (may or may not be given). Advances the process [Restricted] commands: actually carrying out operations –Factory example: start and stop –Car: radio, heat/AC –Phone: call specific number
6
Training Dictation application: user takes time to read specific test to train the system –Note: some systems also adapt with use. If & when user corrects the results, system may do better next time. Phone lookup: user records names. No 'understanding', just record for matching.
7
Audience & content Some systems may allow adapting to audiences, for example, male versus female Some systems have restrictions on types of content –Historical note: IBM system in 1980s & 1990s was restricted to male, American-born speakers (no speech impediments) and legal text.
8
Speech recognition concepts Air pressure diaphragm in phone electrical signal (Fourier Transform) wave pattern matched against sets of canonical patterns (native speaker of English, perhaps male/female & young/old alternatives) generated for the specified grammar (using a segmentation=dividing up of the parts) Note: interplay of grammar and statistics distinguishes different approaches
9
Fourier Transform (Discrete Fourier Transform -- FFT) Takes data representing a signal And produces numbers representing the combination of sine and cosine waves that make up the signal
10
Speech recognition Works on the product of the FFT Uses (in most cases) –Segmentation: attempt to break up into pieces, perhaps syllables or words –Grammar: definition of what is to be expected –Probabilities: if first part matched X, then greater probability that then next would match to Y
11
Current State of the Art General, no restrictions, speech reco, good enough to act on the speech? always about to happen? dictation / substitute for keyboard+ exists and satisfies many –Is this most important application for most users? –May not be killer ap, but may be good for motivating research Homework: prepare brief report on [a] current product or application. Can be one you use yourself.
12
Speech synthesis aka TTS (text to speech) Application determines that the computer needs to say certain words lexical units (syllables of words) phonemes pre-recorded (wav) files of phonemes
13
Speech synthesis This is again a segmentation process: need to divide up the words and then put together so speech sounds 'natural'. –particular phoneme may [need to] sound different in different context. –also need to deal with abbreviations & local accents –Place names (important in travel & weather applications) Special case: detect and use wav file for each name. Older methods were all synthesized –similar distinction between all synthesized and samples of music
14
Speech synthesis is essentially ‘the computer’ reading ‘out loud’. Easy to do most things More and more difficult to do complete job Different languages may be easier than English. People who are not monolingual please comment!
15
Restricted / directed speech applications We will use the tellme studio engine to create directed speech applications. These make use of –Grammars –Options to use numbers (buttons) –Recorded (.wav) sounds –Text to speech
16
studio.tellme.com Company that provides ‘engine’ for applications Provides developing environment –We are doing the Tellme version of VoiceXML, but it appears to be standard. Register as a developer: –Provide your own id; assigned a PIN –Put VoiceXML in ScratchPad place (no audio files) 1-800-555-VXML (8965) –SAY id and then PIN or can give phone number. Tellme runs either program in ScratchPad OR program at Application URL for projects with multiple files To look at someone else's project, you change your Application URL –called pointing your account to a new source.
17
XML Generalization of HTML XML documents have markup. –Tag indicating type of element and, possibly with attributes, content, tag closer. Document must be well-formed. Developers decide on element types.
18
VoiceXML XML document (VXML header) –This means proper nesting of elements, quotation marks on attributes VoiceXML has tags for flow-of-control and calculations. –Also can use for JavaScript Grammars come in different varieties. We will use the Tellme way. –Grammars are included in CDATA tags to prevent XML interpretation. –Many grammars constructed for you. … will listen for yes or no. … will listen for currency. – for list
19
Very brief overview document contains and/or menu elements. – can contain, can contain or do its own audio can contain,,, etc. –NOTE: certain types of elements use built-in grammars, for example, boolean –Can have a child node that indicates what to do if there is a match – is a compressed way use a simple grammar
20
Very brief, cont. Logic can be done using a element that contains a variant of JavaScript and/or vxml logic elements, including – –, –other These may be part of a element
21
Audio Tellme studio provides way to record [your] speech as a wav file to upload to a website. Sends it to your email address You upload your VoiceXML file plus any wav files (and anything else) Welcome to my site If Tellme can't find the mygreeting.wav file, it uses its Text to Speech on the string "Welcome to my site". Note: you also can use a full URL: http://.... You put in the URL for the voicexml file into your Tellme studio account, called pointing to the URL. TEST
22
VoiceXML basics, continued element can contain – elements, which can contain,, other – which can contain (if not one of built-in grammars) tags can be at different levels (for example, document, block, or higher levels) tags elements for JavaScript (which can also appear in expressions>
23
VoiceXML basics: typical case a form element –, made up of, with reference to recorded wav file and backup text, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section. with (follow-on) code using field for nomatch, noinput cases
24
Caution A form contains various elements, including a field. If a field has a grammar and the grammar is satisfied, control goes to a filled tag
25
obligatory… Hello, world recorded using tellme studio backup using TTS, just in case src file missing
26
example Asks for number of credits and calculates when you/caller can register uses built-in grammar for number No error recovery. You need to do better than this in your project. Unfortunate situation: there is a element type filled and an element type field. The < symbols are represented using lt;
27
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> Hello there. How many credits have you earned? <![CDATA[ NATURAL_NUMBER_THRU_999 ]]> Sorry. I didn't get that.
28
You can register on the third day You can register on the second day You can register on the first day You can register on the fourth day Good bye.
29
Homework Do research / think about your own experiences and come prepared to report on a speech recognition / speech synthesis application Start learning VoiceXML
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.