The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007
Powerpoint Page Up Page Down
brains-N-brawn.com Pervasive Computing –Tablet PC (MVP 03) –Compact Framework (MVP 04) –Advanced Web Services (MVP 05) –Media Center (MVP 06) –Speech –Location Based Services –Artificial Intelligence –3D
Outline Speech Overview Vista Speech Recognition SAPI 5.3 / System.Speech Speech Server 2007
Outline : Speech Overview Voice User Interface How does it work? –Synthesis (TTS) –Recognition (SR)
Overview Speech is just another presentation system –Synthesis = Output to user –Recognition = User input Voice User Interface (VUI)
VUI Modes Applications –Multi-modal –Voice-only
VUI Tips Don't replicate the touch-tone-based menu system Restrict options on the main (opening) menu to 4 or fewer Make sure your opening greeting is short Don't design the app solely for the new user Focus on task completion above all What can I say? 006/02/08/ aspx
Speech Synthesis Text to Speech –Dynamic –Prompt database
How Synthesis Works Text parsing –Sentences, numbers, symbols, pauses Natural language processing –Part of speech, tense Phonemes are looked up or sounded out Diphones are appended together Post process audio to add emphasis Play speech audio
How Synthesis Works Demo –/xnaSynth app Article – – (codebase from /ttSpeech)
Speech Recognition Speech to Text –Dictation –Command and Control
How Recognition Works Audio signal is processed Look for signals which might be speech Phonemes are found in audio signals Phonemes are mapped to a dictionary or words –Dictation or grammar-based Apply natural language processing
How Recognition Works Demo –/wavReader app Article – – (codebase from /noReco)
Outline : Vista Speech Recognizer Built-in to Vista’s shell Microphone bar Language support Can be trained to improve accuracy Command-and-control, also Dictation Automagic application support Horrible Office integration UAC problems
Demo Say what you see Show numbers Correct Spell it Mouse grid /vista-speech-recognition-screencast/
High Risk Demo
Hack 65.stm /micBarExtend – tap and talk
Narrator Vista’s screen reader
Outline : SAPI 5.3 / System.Speech Desktop applications –SAPI 5.3 –System.Speech
SAPI 5.3 COM based Native applications Managed apps which need more control
System.Speech Part of.NET 3.0 WPF Managed wrapper built on SAPI 5.3 Simple API Standards support (SSML, SRGS) Language support Vista Speech Recognition integration Does not work in XBAP
System.Speech.Synthesis SpeechSynthesizer SSML PromptBuilder Voices
System.Speech.Synthesis Demo –/speechSamples - /speechSynth
System.Speech.Recognition SpeechRecognizer / SpeechRecognizerEngine SRGS GrammarBuilder Advanced users –Deep-link functionality –Mixed initiative
System.Speech.Recognition Demo –/speechSamples - /speechReco
System.Speech Demo –/micBarExtend –/mceSapiMcpl Article – – – (not updated for Vista yet)
What about Mobile Devices OEMs can add VoiceCommand –VoiceCommand is not accessible to developers WindowsMobile has the SAPI API, but no engines PlatformBuilder is supposed to have engines There are 3 rd party engines for purchase
Outline : Speech Server 2007
Speech Server 2007 Telephony Applications Outgoing calls Speaker Independent
Speech Server 2007 VOIP Language support VoiceXML / SALT Workflow development model Reports Still in beta
Speech Server 2007 Speech Synthesis –Inline –PromptBuilder –SSML –Prompt databases Speech Recognition –Inline –Dynamic Grammar –SRGS –Conversational Grammar Builder –DTMF
VoiceXML Declarative language Article – – –
SALT Yet another declarative language Multimodal support has been dropped Article – – – –
Speech Workflow Speech Sequence Workflow designer Speech activities –Statement –QuestionAnswer Debugging tools
Speech Workflow Demo –/speechTextAdv –/speakerVerify –/mobileRecord Article – brawn.com/speechTextAdv/ brawn.com/speechTextAdv/ – brawn.com/speakerVerify/ brawn.com/speakerVerify/
Where Accessibility Telephony Telematics Home automation Mobile Devices / Tablets Gaming Warehouses …
Possible Future Telematics Service Pack for Office Support Exchange Server 2007 Speech Server 2007 release Rumors that WindowsMobile will get a public API Dictation has room to improve Hope that System.Speech will ultimately work in XBAP
Questions