Integrating VoiceXML with SIP services

Slides:



Advertisements
Similar presentations
(1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc OReilly Conference.
Advertisements

An Application Component Architecture for SIP Jonathan Rosenberg Chief Scientist.
Slide 1, Faynberg & Lu SG 13 Workshop Converged Services Igor Faynberg Hui-Lan Lu Bell Labs, Lucent Technologies.
INTEGRATION OF VOICE SERVICES IN INTERNET APPLICATIONS By Eduardo Carrillo (lecturer), J. J Samper, J.J. Martínez-Durá Universidad Autónoma de Bucaramanga.
Building Applications Using SIP Scott Hoffpauir Vice President, Engineering Fall 1999 VON, Atlanta.
Rob Marchand Genesys Telecommunications
111 © 2003 Cisco Systems, Inc. All rights reserved. CVP 3.x “Full Centralized” Comprehensive: Detailed Call Flows 111 © 2003, Cisco Systems, Inc. All rights.
Security in VoIP Networks Juan C Pelaez Florida Atlantic University Security in VoIP Networks Juan C Pelaez Florida Atlantic University.
Agenda Introduction Requirements Architecture Issues Implementation Q/A Kundan Singh and Henning Schulzrinne, Columbia University.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Pace VoiceXML Absentee System Paul Visokey, Ping Gallivan, Yani Mulyani, Lisa Jordan, Elaine Li, George Mathew, Qisheng Hong Presenter Name : Paul Visokey.
1.Alice (caller) calls Bob 2.The SIP server forks the call to Bob’s phone and the mail server 3.After 10 seconds, the mail server sets up RTSP sessions.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
Building Applications Using SIP Scott Hoffpauir Vice President, Engineering Fall 1999 VON, Atlanta.
Where should services reside in Internet Telephony Systems? Xiaotao Wu, Henning Schulzrinne {xiaotaow, Department of Computer Science,
IRT Lab IP Telephony Columbia 1 Henning Schulzrinne Wenyu Jiang Sankaran Narayanan Xiaotao Wu Columbia University Department of Computer Science.
Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,
E*phone sipc Software SIP user agents Hardware Internet (SIP) phones SIP proxy, redirect server SQL database sipd SIPH.323 converter NetMeeting siph323.
SIP-based Application Development SIP International 2004.
Unified. Simplified. Unified Communications Launch 2007.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Architecture SIP proxy, redirect server SQL database sipd Proxy, Redirect, Registration server. Authentication Programmable (SIP- CGI) OpenSource SQL database:
Architecture Proxy, Redirect, Registration server. Authentication Programmable (SIP- CGI) OpenSource SQL database: MySQL User information:
Agenda Introduction Architecture Issues Implementation features Future plan Demo.
CINEMA Columbia InterNet Extensible Multimedia Architecture
IP telephony overview and demonstration
Sipdsip323sipconfsipumsipvxmlrtspd CINEMA Libraries libNT Win32 stub libcine Utilities parsing IPv6 libsip Basic SIP library libsip++ SIP UA library libmixer.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Objectives Confirm our understanding of what host media processing is and is not Allow us to identify when it should be selected Save time by learning.
PHILIPS SPEECH PROCESSING Voic Association Vienna, Reimund Schmald Regional Sales Director GSM
Iptel not telip 1 03/19/99 Internet Telephony: not Telephony over Internet Jonathan Rosenberg Bell Laboratories Spring VoN 99.
Paul Doyle Director Of Strategic Solutions for Product Management Service Creation Using SIP Ubiquity Software Corporation Suite Lagoon Drive Redwood.
Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
©2000, Columbia University “A flexible architecture to support wide range of multimedia communication applications, both clients and servers” Presented.
Voice User Interface
Internet Real-Time Laboratory demonstration Internet telephony, ubiquitous computing and ad-hoc networking Prof. Henning Schulzrinne (Presented by Ajay.
©2000, Columbia University “A flexible architecture to support wide range of multimedia communication applications, both clients and servers”
Demonstration of Columbia IP telephony test bed Presented by Wenyu Jiang, Kundan Singh and Xiaotao Wu Remote participant: Yi Qin.
Appendix A Implementing Unified Messaging. Appendix Overview Overview of Telephony Introducing Unified Messaging Configuring Unified Messaging.
Web-based Enterprise Telephony Application Development Johnny Wong Principal Member of Technical Staff Oracle Corporation.
IP Columbia Prof. Henning Schulzrinne Internet Real-Time Laboratory Department of Computer Science Columbia University.
Phone Mashups Integrating Telephony & the Web Irv Shapiro CEO, Ifbyphone, Inc.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
VoiceXML Version 2.0 Jon Pitcherella. What is it? A W3C standard for specifying interactive voice dialogues. Uses a “voice” browser to interpret documents,
Internet Real-Time Laboratory demonstration Prof. Henning Schulzrinne
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
IP Columbia Internet Real-Time Laboratory Department of Computer Science Columbia University.
VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better than web.
Out of Sight, But Not Out of Touch Remote Office, Branch Office IP Telephony Solutions Charles Henderson Director, Product Management EADS TELECOM North.
Introduction to Session Initiation Protocol (SIP) Xiaotao Wu and Kundan Singh Columbia University October 24, 2002.
Presented By Sharmin Sirajudeen S7 CS Reg No :
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
SIP-based VoiceXML browser (sipvxml)
Towards Junking the PBX: Deploying IP Telephony
Get the most out of your call center
SIP based VoiceXML browser
Deploying IP Telephony
Architecture OpenSource SQL database: MySQL
Where should services reside in Internet Telephony Systems?
Internet Real-Time Laboratory demonstration
SALT & The Microsoft Speech Application SDK
Prof. Henning Schulzrinne Internet Real-Time Laboratory
Pervasive Pixels Network Services
Integrating VoiceXML with SIP services
Internet Real-Time Laboratory demonstration
Internet Real-Time Laboratory demonstration
Architecture rtspd SIP/RTSP Unified messaging RTSP media server sipum
Henning Schulzrinne Dept. of Computer Science Columbia University
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Integrating VoiceXML with SIP services Kundan Singh, Ajay Nambi and Henning Schulzrinne Columbia University {kns10,an2029,hgs}@cs.columbia.edu I am going to talk about some of our work in integrating VoiceXML for interactive voice response systems with IP telephony services based on Session Initiation Protocol or SIP in our lab.

SIP/VoiceXML @ Columbia University What is VoiceXML? A language for specifying voice dialogs in interactive voice response systems Information retrieval News, sports, traffic, stock quotes, voice-mail e-business Customer service, banking, stock trading Notification service VoiceXML is a language to specify the interactive dialog between a telephone user and the back-end applications such as tele-banking, customer support or voice mail access. SIP/VoiceXML @ Columbia University Sep 2002

SIP/VoiceXML @ Columbia University Traditional IVR Receives incoming PSTN5 call Responds back with prompts Accepts user input (DTMF or speech) Takes action based on user input (Usually the service logic is programmed for the specific application, say weather report) PSTN End user Welcome to voice mail. Press 3 to listen to new messages... 1-212-8545224 Traditionally interactive voice response or IVR has been provided in telephone networks or PSTN using dedicated boxes that perform both voice telephony functions such as speech recognition, text-to-speech and DTMF detection, as well as service logic such as tele-banking or customer-support. The basic call interaction includes prompts generated by the system to ask the user to enter more information and receive input from the user via touch-tone or DTMF or spoken audio. The problem is basically the application specific logic is programmed in the box and hard to change. IVR1 platform Voice and telephony functions (ASR2, TTS3, DTMF4) Service logic (application specific) [1] Interactive voice response [2] Automated speech recognition [3] Text to speech [4] Dual tone multi-frequency (touch tone) [5] Public switched telephone network SIP/VoiceXML @ Columbia University Sep 2002

SIP/VoiceXML @ Columbia University Decomposition PSTN End user Voice gateway Voice and telephony functions Web server Service logic Internet End user One way to simplify this is by dividing the IVR functions in to two: a generic voice gateway handles the voice and telephony functions and the application logic can be programmed using existing web service programming models like common gateway interface or servlets. IVR platform Voice and telephony functions (ASR, TTS, DTMF) Service logic (application specific) SIP/VoiceXML @ Columbia University Sep 2002

SIP/VoiceXML @ Columbia University PSTN Internet End user End user VXML Voice gateway HTML Voice and telephony functions VoiceXML browser DB Multimedia Audio/ grammar Scripts Web server The obvious advantage of this distributed architecture is that the application service logic can serve both the telephone user as well as the web user. It generates HTML web pages that gets accessed by web browsers like internet explorer, and generates VoiceXML pages which gets interpreted by the telephone browser or VoiceXML browser sitting on the voice gateway. The voiceXML browser interprets the pages and generates interactive dialog with the telephone user, similar to how a web browser displays out to a PC user, and takes input from HTML form, buttons, check-boxes and so on. Another advantage is that we can use the existing web development tools like various perl and tcl CGI libraries. For a voice application these service logic scripts need to access multimedia content also. Web server Service logic (CGI, servlet, JSP) SIP/VoiceXML @ Columbia University Sep 2002

SIP/VoiceXML @ Columbia University HTML vs VoiceXML <form> <field name=‘id’> <prompt> Your ID, please. </prompt> </field> <block> <submit next=“url”/> </block> </form> <form action=“url”> Enter your Id: <input name=‘id’> <input type=‘submit’> </form> This shows an example VoiceXML page on the right compared with an HTML page fragment on the left. The HTML code on the left when interpreted by a web browser will generate an edit-box with a prompt text to enter an ID. When the user enters his ID and clicks on the submit button, the ID gets stored in the variable “id” which then gets passed to the “url”. Further action or output is generated by the scripts pointed by the “url” using the input variables like “id”. Similarly on the right, when a voiceXML browser interprets the page, it prompts the telephone user to enter his ID using text-to-speech. When the user enters his ID using touch-tone, the digits get stored in the variable “id” which gets passed to the “url” for further action. The VoiceXML language specification describes various tags and attributes for telephony function like call transfer, speech synthesis, user input, and so on. The specification is being developed in W3C (world wide web consortium) Telephony, speech Synthesis or audio output, user input and grammar, program flow, variable and properties, error handling, … SIP/VoiceXML @ Columbia University Sep 2002

Further decomposition PSTN Internet End user End user Voice gateway Voice and telephony function VoiceXML browser It is possible to further decompose the voice gateway functions in to two: a generic PSTN/IP gateway and a VoiceXML browser. The PSTN/IP gateway just performs interworking between a PSTN call and a Session Initiation Protocol based IP telephony call. Web server Service logic (CGI, servlet, JSP) SIP/VoiceXML @ Columbia University Sep 2002

Our Implementation of a SipVxml Browser. (Part of our CINEMA1 TestBed) Internet telephony SIP softphone PSTN Internet End user SIP/PSTN gateway SipVxml SIP hardware phone Our Implementation of a SipVxml Browser. (Part of our CINEMA1 TestBed) Media server (RTSPd) The voiceXML browser then can be completely moved to the IP network, so that it can be accessed by a PSTN phone as well as a IP-phone. In the diagram, sipvxml represents our implementation of SIP/VoiceXML browser. It can be used for a variety of example applications like Email access by phone, voicemail by phone, directory services I.e., name to phone mapping for department, web browsing by phone, and so on. These applications are implemented as back-end CGI scripts or Java servlets running at the web server. Media files such as recorded voice mails can be accessed in real time from a media server such as Real-time streaming protocol server. (Basically if the URL if rtspd:// then RTSP is used.) The browser accepts a SIP call, then fetches the VoiceXML pages over HTTP from the web server. The XML page is parsed. Various VoiceXML tags are interpreted. It does text-to-speech conversion to generate prompts as needed. It can receive user input using DTMF. We have implemented RFC2833 style RTP packet containing DTMF digits (in which case the detection happens at the gateway or by the caller) as well as some simple DTMF detection in the server itself. It doesn’t support speech recognition currently. The DTMF is parsed as per the specified grammar, e.g., an ZIP code grammar may accept 5 digits where as a telephony number grammar may accept all digits up to pound key #. If the service logic requires fetching a media (audio) file and playing back to the telephone user, it can do so. Web server (HTTPd) [1] CINEMA - Columbia InterNet Extensible Multimedia Architecture SIP/VoiceXML @ Columbia University Sep 2002

SIP/VoiceXML @ Columbia University Conferencing 1. INVITE sipvxml 2. Call accepted 3. Enter your four digit PIN 4. Entered 4-6-8-3 5. Authenticate user, 4683=>Alice 6. Enter the conference identifier 7. Entered 2-3-# SipVxml 8. Permission to join, 23=>meet 9. REFER meet@conference 10.Terminate the old call Caller One such application of SipVxml in our IP telephony test-bed is to allow a telephone user to get authenticated and join a conference. When the caller make a call to the browser, the call is accepted and it prompts for the callers PIN or personal identification number. The PIN is mapped to the user identifier, and then it prompts for the conference identifier or gives a list of conferences to choose from for this user. The conference identifier is matched to see if the user is allowed to join this conference, and then the browser transfers the call to the conference server with appropriate conference URL and authorization credentials if needed. VoiceXML supports both blind call transfer like this, or bridged mode in which case the browser joins the conference server and connects the voice path, without indicating to the caller phone. This means the browser remains in the call state path for the duration of the call, and may also remain in the voice path to accept further in-call DTMF commands. However, the additional state limits the scalability of the browser. 11.INVITE meet@conference Conference server Call transfer vs bridged mode SIP/VoiceXML @ Columbia University Sep 2002

Ease & Flexibility The ease & flexibility of SipVXML enables us to build custom telephonic applications to suit our needs. E.g Volume Check Application 1. INVITE sipvxml 2. Menu 1. Vol Check 2. Mic Check 3. User enters 2 4. User speaks out a voice sample 5. Voice sample is analyzed 6. SipVXML: Vol level too high/low/… Another interesting application of Sipvxml was to balance the volume level of conference participants. We observed that because of different implementation of IP phone and software PC clients, the volume of various participants was quite un-balanced. Some participants could be heard very loud, where as some could hardly be heard. Ideally the conference server should itself try to balance the audio volume level, however that puts additional load on the server. So the participants can call this browser and speak into it for some time, and it will prompt if the volume if too high, too low or ok. 7. User adjusts the vol level. SipVxml 7. User now joins conference. Conference server Caller SIP/VoiceXML @ Columbia University Sep 2002

More usage in the CINEMA test-bed Unified messaging access Email by phone Event notification and scheduling Audio volume level for conference Advanced conference control As mentioned earlier we have used it in variety of application in our test bed include voice-mail access, email access by phone, and audio volume control. We are currently developing applications to schedule notifications such as wake up call or meeting reminder to telephone, and advanced conference control like floor control by conference administrator. SIP/VoiceXML @ Columbia University Sep 2002

SIP/VoiceXML @ Columbia University Conclusions VoiceXML is simple and exciting Sipvxml is useful for IP telephony and regular telephony Numerous easy to develop applications In conclusion VoiceXML is a simple easy to use tool that can be used for many interesting services in telephony. SIP interface allows its use for both telephone users as well as IP-telephony users. This we have demonstrated in our test-bed using a software implemented of SIP/VoiceXML browser. More information can be found at these links. Some more information for questions: We have implemented VoiceXML 1.0. There is 2.0 version, which we are currently working on. We have implemented only a minimum sub-set of tags in VoiceXML as needed for our applications. The software can run on Linux. With minor changes it can be run on other Unix platforms also but we haven’t done it yet. Most of the libraries like SIP, RTSP, RTP that it uses run on Unix (FreeBSD, Solaris, Linux) and Windows (XP,2000,NT). The text-to-speech was used from IBM ViaVoice for Linux, initially. Later we changed it to F-lite which is freely available. It needs to be licensed from SIPquest. The licenses are currently suspended will be available sometime soon in summer. Anything else? http://www.cs.columbia.edu/IRT/cinema/doc/sipvxml.html http://www.cs.columbia.edu/IRT/cinema http://www.w3.org/Voice/ SIP/VoiceXML @ Columbia University Sep 2002