Download presentation
Presentation is loading. Please wait.
1
Integrating VoiceXML with SIP services
Kundan Singh, Ajay Nambi and Henning Schulzrinne Columbia University I am going to talk about some of our work in integrating VoiceXML for interactive voice response systems with IP telephony services based on Session Initiation Protocol or SIP in our lab.
2
SIP/VoiceXML @ Columbia University
What is VoiceXML? A language for specifying voice dialogs in interactive voice response systems Information retrieval News, sports, traffic, stock quotes, voic e-business Customer service, banking, stock trading Notification service VoiceXML is a language to specify the interactive dialog between a telephone user and the back-end applications such as tele-banking, customer support or voice mail access. Columbia University Sep 2002
3
SIP/VoiceXML @ Columbia University
Traditional IVR Receives incoming PSTN5 call Responds back with prompts Accepts user input (DTMF or speech) Takes action based on user input (Usually the service logic is programmed for the specific application, say weather report) PSTN End user Welcome to voice mail. Press 3 to listen to new messages... Traditionally interactive voice response or IVR has been provided in telephone networks or PSTN using dedicated boxes that perform both voice telephony functions such as speech recognition, text-to-speech and DTMF detection, as well as service logic such as tele-banking or customer-support. The basic call interaction includes prompts generated by the system to ask the user to enter more information and receive input from the user via touch-tone or DTMF or spoken audio. The problem is basically the application specific logic is programmed in the box and hard to change. IVR1 platform Voice and telephony functions (ASR2, TTS3, DTMF4) Service logic (application specific) [1] Interactive voice response [2] Automated speech recognition [3] Text to speech [4] Dual tone multi-frequency (touch tone) [5] Public switched telephone network Columbia University Sep 2002
4
SIP/VoiceXML @ Columbia University
Decomposition PSTN End user Voice gateway Voice and telephony functions Web server Service logic Internet End user One way to simplify this is by dividing the IVR functions in to two: a generic voice gateway handles the voice and telephony functions and the application logic can be programmed using existing web service programming models like common gateway interface or servlets. IVR platform Voice and telephony functions (ASR, TTS, DTMF) Service logic (application specific) Columbia University Sep 2002
5
SIP/VoiceXML @ Columbia University
PSTN Internet End user End user VXML Voice gateway HTML Voice and telephony functions VoiceXML browser DB Multimedia Audio/ grammar Scripts Web server The obvious advantage of this distributed architecture is that the application service logic can serve both the telephone user as well as the web user. It generates HTML web pages that gets accessed by web browsers like internet explorer, and generates VoiceXML pages which gets interpreted by the telephone browser or VoiceXML browser sitting on the voice gateway. The voiceXML browser interprets the pages and generates interactive dialog with the telephone user, similar to how a web browser displays out to a PC user, and takes input from HTML form, buttons, check-boxes and so on. Another advantage is that we can use the existing web development tools like various perl and tcl CGI libraries. For a voice application these service logic scripts need to access multimedia content also. Web server Service logic (CGI, servlet, JSP) Columbia University Sep 2002
6
SIP/VoiceXML @ Columbia University
HTML vs VoiceXML <form> <field name=‘id’> <prompt> Your ID, please. </prompt> </field> <block> <submit next=“url”/> </block> </form> <form action=“url”> Enter your Id: <input name=‘id’> <input type=‘submit’> </form> This shows an example VoiceXML page on the right compared with an HTML page fragment on the left. The HTML code on the left when interpreted by a web browser will generate an edit-box with a prompt text to enter an ID. When the user enters his ID and clicks on the submit button, the ID gets stored in the variable “id” which then gets passed to the “url”. Further action or output is generated by the scripts pointed by the “url” using the input variables like “id”. Similarly on the right, when a voiceXML browser interprets the page, it prompts the telephone user to enter his ID using text-to-speech. When the user enters his ID using touch-tone, the digits get stored in the variable “id” which gets passed to the “url” for further action. The VoiceXML language specification describes various tags and attributes for telephony function like call transfer, speech synthesis, user input, and so on. The specification is being developed in W3C (world wide web consortium) Telephony, speech Synthesis or audio output, user input and grammar, program flow, variable and properties, error handling, … Columbia University Sep 2002
7
Further decomposition
PSTN Internet End user End user Voice gateway Voice and telephony function VoiceXML browser It is possible to further decompose the voice gateway functions in to two: a generic PSTN/IP gateway and a VoiceXML browser. The PSTN/IP gateway just performs interworking between a PSTN call and a Session Initiation Protocol based IP telephony call. Web server Service logic (CGI, servlet, JSP) Columbia University Sep 2002
8
Our Implementation of a SipVxml Browser. (Part of our CINEMA1 TestBed)
Internet telephony SIP softphone PSTN Internet End user SIP/PSTN gateway SipVxml SIP hardware phone Our Implementation of a SipVxml Browser. (Part of our CINEMA1 TestBed) Media server (RTSPd) The voiceXML browser then can be completely moved to the IP network, so that it can be accessed by a PSTN phone as well as a IP-phone. In the diagram, sipvxml represents our implementation of SIP/VoiceXML browser. It can be used for a variety of example applications like access by phone, voic by phone, directory services I.e., name to phone mapping for department, web browsing by phone, and so on. These applications are implemented as back-end CGI scripts or Java servlets running at the web server. Media files such as recorded voice mails can be accessed in real time from a media server such as Real-time streaming protocol server. (Basically if the URL if rtspd:// then RTSP is used.) The browser accepts a SIP call, then fetches the VoiceXML pages over HTTP from the web server. The XML page is parsed. Various VoiceXML tags are interpreted. It does text-to-speech conversion to generate prompts as needed. It can receive user input using DTMF. We have implemented RFC2833 style RTP packet containing DTMF digits (in which case the detection happens at the gateway or by the caller) as well as some simple DTMF detection in the server itself. It doesn’t support speech recognition currently. The DTMF is parsed as per the specified grammar, e.g., an ZIP code grammar may accept 5 digits where as a telephony number grammar may accept all digits up to pound key #. If the service logic requires fetching a media (audio) file and playing back to the telephone user, it can do so. Web server (HTTPd) [1] CINEMA - Columbia InterNet Extensible Multimedia Architecture Columbia University Sep 2002
9
SIP/VoiceXML @ Columbia University
Conferencing 1. INVITE sipvxml 2. Call accepted 3. Enter your four digit PIN 4. Entered 5. Authenticate user, 4683=>Alice 6. Enter the conference identifier 7. Entered 2-3-# SipVxml 8. Permission to join, 23=>meet 9. REFER 10.Terminate the old call Caller One such application of SipVxml in our IP telephony test-bed is to allow a telephone user to get authenticated and join a conference. When the caller make a call to the browser, the call is accepted and it prompts for the callers PIN or personal identification number. The PIN is mapped to the user identifier, and then it prompts for the conference identifier or gives a list of conferences to choose from for this user. The conference identifier is matched to see if the user is allowed to join this conference, and then the browser transfers the call to the conference server with appropriate conference URL and authorization credentials if needed. VoiceXML supports both blind call transfer like this, or bridged mode in which case the browser joins the conference server and connects the voice path, without indicating to the caller phone. This means the browser remains in the call state path for the duration of the call, and may also remain in the voice path to accept further in-call DTMF commands. However, the additional state limits the scalability of the browser. 11.INVITE Conference server Call transfer vs bridged mode Columbia University Sep 2002
10
Ease & Flexibility The ease & flexibility of SipVXML enables us to build custom telephonic applications to suit our needs. E.g Volume Check Application 1. INVITE sipvxml 2. Menu 1. Vol Check 2. Mic Check 3. User enters 2 4. User speaks out a voice sample 5. Voice sample is analyzed 6. SipVXML: Vol level too high/low/… Another interesting application of Sipvxml was to balance the volume level of conference participants. We observed that because of different implementation of IP phone and software PC clients, the volume of various participants was quite un-balanced. Some participants could be heard very loud, where as some could hardly be heard. Ideally the conference server should itself try to balance the audio volume level, however that puts additional load on the server. So the participants can call this browser and speak into it for some time, and it will prompt if the volume if too high, too low or ok. 7. User adjusts the vol level. SipVxml 7. User now joins conference. Conference server Caller Columbia University Sep 2002
11
More usage in the CINEMA test-bed
Unified messaging access by phone Event notification and scheduling Audio volume level for conference Advanced conference control As mentioned earlier we have used it in variety of application in our test bed include voic access, access by phone, and audio volume control. We are currently developing applications to schedule notifications such as wake up call or meeting reminder to telephone, and advanced conference control like floor control by conference administrator. Columbia University Sep 2002
12
SIP/VoiceXML @ Columbia University
Conclusions VoiceXML is simple and exciting Sipvxml is useful for IP telephony and regular telephony Numerous easy to develop applications In conclusion VoiceXML is a simple easy to use tool that can be used for many interesting services in telephony. SIP interface allows its use for both telephone users as well as IP-telephony users. This we have demonstrated in our test-bed using a software implemented of SIP/VoiceXML browser. More information can be found at these links. Some more information for questions: We have implemented VoiceXML 1.0. There is 2.0 version, which we are currently working on. We have implemented only a minimum sub-set of tags in VoiceXML as needed for our applications. The software can run on Linux. With minor changes it can be run on other Unix platforms also but we haven’t done it yet. Most of the libraries like SIP, RTSP, RTP that it uses run on Unix (FreeBSD, Solaris, Linux) and Windows (XP,2000,NT). The text-to-speech was used from IBM ViaVoice for Linux, initially. Later we changed it to F-lite which is freely available. It needs to be licensed from SIPquest. The licenses are currently suspended will be available sometime soon in summer. Anything else? Columbia University Sep 2002
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.