Speech in.NET Sphinx CMU November 2002. 2 Presenter casey chesnut – Web Services – Mobile / Wireless – Speech.

1 Speech in.NET Sphinx CMU November 2002

2 2 Presenter casey chesnut – Web Services – Mobile / Wireless – Speech

3 3 Audience Java / C++ / VB / C# ? VoiceXml ? SALT / Speech.NET ?

4 4 Outline MS Technologies VoiceXml – Demo Speech.NET – Demo Future Questions (throughout) ~25 slides

5 5 MS Technologies Tools Devices – Phone – Desktop PC – Pocket PC – Tablet PC

6 6 Tools MS Agents SAPI / Speech SDK 5.1 (.NET wrappable) Office AutoPC ??? ASP.NET (VoiceXml) (beta) Speech.NET / IE Speech Add-In … SALT Telephony gateway (early 2003) … Pocket IE Speech Add-In (mid 2003)

7 7 Devices Phone – billions of devices, people are comfortable speaking to Desktop PC – large market, speech input is slower and uncomfortable Pocket PC – small market, opportunities for speech (device limitations) Tablet PC – new market, speech friendly (slate models don’t have keyboards)

8 8 Phone ASP.NET w/ VoiceXml 2.0 – Production quality now – Multiple vendor support Speech.NET VoiceOnly – Currently no way to deploy and test over a phone – Speech.NET Beta 2 has telephony simulation – MS target market for Speech.NET

9 9 Desktop PC Web – Speech.NET MultiModal Beta 2 IE Speech Add-In – Embedded control w/SAPI – MS Agents Fat – SAPI – MS Agents

10 10 Pocket PC Web – SALT Pocket IE Speech Add-Ins (mid 2003) Fat – 3 rd parties only – MS Reader does not support TTS

11 11 Tablet PC - TODAY! Web – … same as desktop PC – Beta 2 has added support for Tablet PC – Virtual keyboard has speech control Fat – … same as desktop PC – Virtual keyboard has speech control – MS Reader should be able to support TTS – Digital Ink is currently more compelling to MS

12 12 VoiceXml XML-based language – Declarative – XML tags, grammars – Procedural – Javascript Telephony Gateway is the client – Event driven – Bargein, Goodbye – Object oriented – Properties

13 13 Usage Input – Speech Recognition (Command and Control) – DTMF – Voice recording and posting to a server Output – Text-To-Speech – Prerecorded audio files Telephony control – Hang-up, Transfers, …

14 14 Architecture

15 15 VoiceXml DEMO – /vxml (VS.NET) – Mobile ADK (menu1.aspx) – BeVocal

16 16 VoiceXml - SALT VoiceXml : ??? : : SALT : Speech.NET – Nuance has some WYSIWYG SALT is considered lightweight to VoiceXml SALT was submitted to W3C August 2002 VoiceXml is v2.0 in W3C – Mandatory W3C grammar spec Beta 2 Speech.NET has moved to W3C SRGS VoiceXml has complementary specs (ccXml) VoiceXml is moving to MultiModal as well

17 17 VoiceXml - SALT VoiceXml = AT&T, Motorola, TellMe, (IBM) SALT = MS, SpeechWorks, Intel, (BeVocal) VoiceXml has multiple vendor support with venture capital from before the burst Most vendors will support both specs VoiceXml has ~ 15,000 developers SALT has potentially millions

18 18 SALT I have not read the new spec Remember doing an in-head mapping to VoiceXml when reading an early spec Why – Common spec for MultiModal operation – Multiple modes of interaction with the same syntax – Speech enabling existing sites Why not VoiceXml – MultiModal retrofit harder than redo

19 19 Speech.NET MS implementation of SALT (VoiceWebSolutions + DreamWeaver MX) Some Beta 1 Speech.NET apps still work, because SALT has not changed much, but Speech.NET Beta 2 controls have VoiceXml not as portable between vendors as it should be, the Speech.NET controls could help mitigate this for SALT – i.e. layer of abstraction for voice browser wars

20 20 Architecture

21 21 Code Creating static grammars and prompts Very little server-side code – Only dynamic grammars / prompts – Server-side code mods to better support speech Mainly setting properties on Speech controls and tying to client-side javascript Tie javascript to mouse-click events to avoid redundant code

22 22 Impression Separate app layers to reduce complexity – Voice UI will be less functional, design is key Learning low level SALT might be easier than high level Speech.NET controls Application controls change this in Beta 2 Speech.NET has a great debugger (now server side too), grammar, and prompt tools Speech Control Editor was needed for dev IE Audio meter was needed for MultiModal MultiModal has some time to grow

23 23 Speech.NET DEMO – Speech.NET Beta 2 (VS.NET) – /noHands (VoiceOnly web app)

24 24 Industry Wrote 1 st VoiceXml article a year ago – Received 1 st proposal request last month – 1 other proposal request since then Wrote 1 st Speech.NET article 5 months ago – Request for an article from MSDN magazine

25 25 Voice Recognition PSTN is less secure than Internet! – More accessible and easier to automate hack Traditionally spoken password OR DTMF pin, also # Clients always confuse with speech recognition Not a part of VoiceXml or SALT specs – Telephony gateways proprietary implementations Not useful for identifying somebody Useful for confirming somebody is whom they say they are Prints have to change when device changes

26 26 Future (MS Speech) SALT Telephony gateways Speech.NET (VoiceOnly then MultiModal) Pocket IE Speech Add-In NET Fat-client Speech APIs – Desktop / Tablet / PPC MS or 3 rd party VS.NET VoiceXml controls Possibility for Speech.NET controls to render both SALT and VoiceXml

27 27 Future Lots of W3C Voice specs … VoiceXml MultiModal browser Auto (hands-free, navigation, radio) 3G (bridge voice and wireless web) – offload Speech processing – VOIP or PSTN – Pocket PC Phone Edition / SmartPhones IBM recently announced chip for Speech on mobile devices

28 28 Questions

