Download presentation
Presentation is loading. Please wait.
1
Speech in.NET Sphinx CMU November 2002
2
2 Presenter casey chesnut brains-N-brawn.com – Web Services – Mobile / Wireless – Speech
3
3 Audience Java / C++ / VB / C# ? VoiceXml ? SALT / Speech.NET ?
4
4 Outline MS Technologies VoiceXml – Demo Speech.NET – Demo Future Questions (throughout) ~25 slides
5
5 MS Technologies Tools Devices – Phone – Desktop PC – Pocket PC – Tablet PC
6
6 Tools MS Agents SAPI / Speech SDK 5.1 (.NET wrappable) Office AutoPC ??? ASP.NET (VoiceXml) (beta) Speech.NET / IE Speech Add-In … SALT Telephony gateway (early 2003) … Pocket IE Speech Add-In (mid 2003)
7
7 Devices Phone – billions of devices, people are comfortable speaking to Desktop PC – large market, speech input is slower and uncomfortable Pocket PC – small market, opportunities for speech (device limitations) Tablet PC – new market, speech friendly (slate models don’t have keyboards)
8
8 Phone ASP.NET w/ VoiceXml 2.0 – Production quality now – Multiple vendor support Speech.NET VoiceOnly – Currently no way to deploy and test over a phone – Speech.NET Beta 2 has telephony simulation – MS target market for Speech.NET
9
9 Desktop PC Web – Speech.NET MultiModal Beta 2 IE Speech Add-In – Embedded control w/SAPI – MS Agents Fat – SAPI – MS Agents
10
10 Pocket PC Web – SALT Pocket IE Speech Add-Ins (mid 2003) Fat – 3 rd parties only – MS Reader does not support TTS
11
11 Tablet PC - TODAY! Web – … same as desktop PC – Beta 2 has added support for Tablet PC – Virtual keyboard has speech control Fat – … same as desktop PC – Virtual keyboard has speech control – MS Reader should be able to support TTS – Digital Ink is currently more compelling to MS
12
12 VoiceXml XML-based language – Declarative – XML tags, grammars – Procedural – Javascript Telephony Gateway is the client – Event driven – Bargein, Goodbye – Object oriented – Properties
13
13 Usage Input – Speech Recognition (Command and Control) – DTMF – Voice recording and posting to a server Output – Text-To-Speech – Prerecorded audio files Telephony control – Hang-up, Transfers, …
14
14 Architecture
15
15 VoiceXml DEMO – /vxml (VS.NET) – Mobile ADK (menu1.aspx) – BeVocal
16
16 VoiceXml - SALT VoiceXml : ??? : : SALT : Speech.NET – Nuance has some WYSIWYG SALT is considered lightweight to VoiceXml SALT was submitted to W3C August 2002 VoiceXml is v2.0 in W3C – Mandatory W3C grammar spec Beta 2 Speech.NET has moved to W3C SRGS VoiceXml has complementary specs (ccXml) VoiceXml is moving to MultiModal as well
17
17 VoiceXml - SALT VoiceXml = AT&T, Motorola, TellMe, (IBM) SALT = MS, SpeechWorks, Intel, (BeVocal) VoiceXml has multiple vendor support with venture capital from before the burst Most vendors will support both specs VoiceXml has ~ 15,000 developers SALT has potentially millions
18
18 SALT I have not read the new spec Remember doing an in-head mapping to VoiceXml when reading an early spec Why – Common spec for MultiModal operation – Multiple modes of interaction with the same syntax – Speech enabling existing sites Why not VoiceXml – MultiModal retrofit harder than redo
19
19 Speech.NET MS implementation of SALT (VoiceWebSolutions + DreamWeaver MX) Some Beta 1 Speech.NET apps still work, because SALT has not changed much, but Speech.NET Beta 2 controls have VoiceXml not as portable between vendors as it should be, the Speech.NET controls could help mitigate this for SALT – i.e. layer of abstraction for voice browser wars
20
20 Architecture
21
21 Code Creating static grammars and prompts Very little server-side code – Only dynamic grammars / prompts – Server-side code mods to better support speech Mainly setting properties on Speech controls and tying to client-side javascript Tie javascript to mouse-click events to avoid redundant code
22
22 Impression Separate app layers to reduce complexity – Voice UI will be less functional, design is key Learning low level SALT might be easier than high level Speech.NET controls Application controls change this in Beta 2 Speech.NET has a great debugger (now server side too), grammar, and prompt tools Speech Control Editor was needed for dev IE Audio meter was needed for MultiModal MultiModal has some time to grow
23
23 Speech.NET DEMO – Speech.NET Beta 2 (VS.NET) – /noHands (VoiceOnly web app)
24
24 Industry Wrote 1 st VoiceXml article a year ago – Received 1 st proposal request last month – 1 other proposal request since then Wrote 1 st Speech.NET article 5 months ago – Request for an article from MSDN magazine
25
25 Voice Recognition PSTN is less secure than Internet! – More accessible and easier to automate hack Traditionally spoken password OR DTMF pin, also # Clients always confuse with speech recognition Not a part of VoiceXml or SALT specs – Telephony gateways proprietary implementations Not useful for identifying somebody Useful for confirming somebody is whom they say they are Prints have to change when device changes
26
26 Future (MS Speech) SALT Telephony gateways Speech.NET (VoiceOnly then MultiModal) Pocket IE Speech Add-In NET Fat-client Speech APIs – Desktop / Tablet / PPC MS or 3 rd party VS.NET VoiceXml controls Possibility for Speech.NET controls to render both SALT and VoiceXml
27
27 Future Lots of W3C Voice specs … VoiceXml MultiModal browser Auto (hands-free, navigation, radio) 3G (bridge voice and wireless web) – offload Speech processing – VOIP or PSTN – Pocket PC Phone Edition / SmartPhones IBM recently announced chip for Speech on mobile devices
28
28 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.