Reliable, Scalable and Interoperable Internet Telephony PhD thesis presentation by Kundan Singh Advisor: Henning Schulzrinne Computer Science Department, Columbia University, New York June 21, 2006
2 My research background/timeline Motorola H.323 client gateway SIP-H.323 translator SIP-RTSP voice mail SIP conferencing Libsip++ (SIP library) P2P VoIP using SIP SIP Failover/load sharing Enterprise VoIP infrastructure Interactive voice response CINEMA user interface Multimedia collaboration Mobile NAT Reliability & scalability VoIP infrastructure Conference evaluation
3 Outline of the presentation Introduction What is the problem? Why important? My contributions Server redundancy Load sharing and failover in SIP telephony Comparison of thread models for SIP server Peer-to-peer (P2P) SIP servers using external P2P network Additionally, P2P maintenance using SIP Enterprise IP telephony Multi-platform collaboration using SIP Scalable centralized conferencing architecture Interworking between SIP/SDP and H.323 Conclusion 36 slides
4 Telephone reliability & scalability (PSTN: Public Switched Telephone Network) “bearer” network telephone switch (SSP) database (SCP) for freephone, calling card, … signaling network (SS7) signaling router (STP) local telephone switch (class 5 switch) 10,000 customers 20,000 calls/hour database (SCP) 10 million customers 2 million lookups/hour signaling router (STP) 1 million customers 1.5 million calls/hour regional telephone switch (class 4 switch) 100,000 customers 150,000 calls/hour Switches strive for % availability. Lucent’s 5E-XC supports 4 million BHCA.
5 DB Internet telephony (SIP: Session Initiation Protocol) yahoo.comexample.com REGISTER INVITE DNS Can SIP server provide carrier grade reliability and scalability using commodity hardware
6 What are the problems? Can SIP server provide carrier-grade reliability and scalability using commodity hardware? What affects the server performance? How can we build a server-less self- organizing peer-to-peer VoIP network? Can this be done in standards compliant way? Can communication be extended to multi- platform collaboration using existing protocols? How well multi-party conferencing scales? How to interoperate between SIP and H.323?
7 My contributions Server redundancy Implemented failover using database replication Two-stage architecture for SIP load sharing Comparison of thread models for SIP server Peer-to-peer (P2P) SIP servers using external P2P network Additionally, P2P maintenance using SIP Enterprise IP telephony Multi-platform collaboration using SIP Scalable centralized conferencing architecture Interworking between SIP/SDP and H.323 New architecture, New algorithm or approach, Implementation, Evaluation
8 Outline of the presentation Introduction What is the problem? Why important? My contributions Server redundancy Load sharing and failover in SIP telephony Comparison of thread models for SIP server Peer-to-peer (P2P) SIP servers using external P2P network Additionally, P2P maintenance using SIP Enterprise IP telephony Multi-platform collaboration using SIP Scalable centralized conferencing architecture Interworking between SIP/SDP and H.323 Conclusion
9 Server redundancy The problem: failure or overload REGISTERINVITE
10 High availability Implementation and analysis Implementation Using MySQL replication System reliability individual component reliability Call setup latency DNS TTL, time-to-repair, retry timeout User unavailability Most registers are refreshes Retry timeout, replication interval, register refresh interval Slave/ master Master/ slave P1 P2 DNS Caller D1 D2 TRTR Callee Tr TRTR TcTc TcTc TdTd A A TcTc P1 P2 D1
11 Scalability Load sharing: redundant proxies and databases REGISTER Write to D1 & D2 INVITE Read from D1 or D2 Database write/ synchronization traffic becomes bottleneck D1 D2 P1 P2 P3 REGISTER INVITE
12 Scalability Load sharing: divide the user space Proxy and database on the same host Stateless proxy can become overloaded Use many D1 D2 P1 P2 P3 D3 a-h i-q r-z
13 Scalability Comparison of the two designs D1 D2 P1 P2 P3 D1 D2 P1 P2 P3 D2 a-h i-q r-z High Scalability Low Reliability Low Scalability High Reliability
14 Scalability (and reliability) Two stage architecture MasterSlaveMasterSlave s1 s2 s3 a1 a2 b1 b2 example.com _sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com a.example.com _sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com b.example.com _sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com Capacity = f(#stateless, #groups) ex I stageII stage
15 Load Sharing Performance result: calls/second Using 3+3 servers gives carrier-grade performance (10 million busy hour call attempts) Registration test: supports 10 million subscribers On commodity hardware:3 GHz, Pentium 4, 1 GB memory Test with UDP, stateless, no DNS and no mempool
16 Server performance What happens inside a server? What thread/event models possible? recvfrom Match transaction Modify response Match transaction Update DB Lookup DB Build response Modify Request DNS sendto parse Request Response Stateless proxy Found Stateless proxy stateful REGISTER other Redirect/reject Proxy (Blocking) I/O Critical section (lock) Critical section (r/w lock) 1.Pure event-based (one thread) 2.Thread-per-msg or transaction 3.Pool-thread per msg (sipd) 4.Two stage thread pool accept recvFile readsend Web server SIP server
17 Server performance Results of my measurement Event-based performs 30% better than existing thread-pool architecture on single-CPU Two stage thread-pool architecture gives better performance on multi-CPU 60% more on 4xPentium Both Pentium and Sparc took 2 MHz of CPU cycles per call/s on single-CPU
18 Problem with servers Server-based Cost: maintenance, configuration Central points of failures, catastrophic failures Controlled infrastructure (e.g., DNS) Peer-to-peer Robust: no central dependency Self organizing, no configuration Inherent scalability C C C C C S P P P P P
19 We built: P2P-SIP Unlike proprietary Skype architecture Robust and efficient lookup using DHT Interoperability DHT algorithm uses SIP communication Hybrid architecture Lookup in SIP+P2P Inter-domain P2P-SIP Unlike file-sharing applications Data storage, caching, delay, reliability Data model and service model Disadvantages Lookup delay and security
20 How to combine SIP + P2P? SIP-using-P2P Replace SIP location service by a P2P protocol P2P-over-SIP Additionally, implement P2P using SIP messaging P2P network Alice INSERT INVITE P2P-SIP overlay Alice REGISTER INVITE alice FIND SIP-using-P2PP2P SIP proxiesP2P-over-SIP MaintenanceP2P SIP LookupP2PSIP
21 SIP-using-P2P Logical Operations Contact management put (user id, signed contact) Key storage User certificates and private configurations Presence put (subscribee id, signed encrypted subscriber id) Composition needs service model Offline message put (recipient, signed encrypted message) NAT and firewall traversal STUN and TURN server discovery needs service model XML-based data format
22 SIP-using-P2P Implementation in SIPc with the help of Xiaotao Wu OpenDHT Using Data model Identity protection Certificate-based SIP id == P2P for Calls, IM, presence, offline message and name lookup
23 P2P-over-SIP Architecture and implementation DHT (Chord) algorithm using SIP messages with query and update semantics of REGISTER Has SIP registrar, proxy, user agent Other: discovery, NAT traversal, failover Adaptor: allows existing SIP devices to become P2P
24 P2P-over-SIP Analysis: scalability Computed message load as function of Refresh rate (keep-alive, finger table, user registration), call arrival rate, churn (join, leave, failure), scale (number of peer nodes and users) Number of nodes = f(individual node capacity) Measured performance: 800 register/s. Assuming a conservative 10 reqister/s capacity, and aggressive refresh and call rate of 1/min, it gives more than 16 million peers (super nodes) in the network.
25 P2P-over-SIP Analysis: availability and call setup latency To increase user availability: Fast failure detection: increase keep-alive rate Reduce unavailability: frequent registration refresh Replicate: user and node registrations Call setup latency: Same as DHT lookup latency: O(log(N)) Calls to known locations (“buddies”) is direct Chord: nodes => 6 hops At most a few seconds User availability and retransmission timers
26 SIP-using-P2P vs P2P-over-SIP Not SIP-specific, hence no implementation overhead for non-VoIP but P2P applications Low transport and transaction overhead No P2P security burden on SIP No dependency on single DHT implementation Reuse SIP naming, routing, security, NAT/firewall traversal Easily reuse existing SIP components without change voic , conference Single DHT implementation Readily supports service model
27 Server-based vs peer-to-peer Server-based vs peer-to-peer Reliability, failover latency DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval. Scalability, number of users Depends on number of servers in the two stages. Depends on refresh rate, join/leave rate, uptime Call setup latency One or two steps.O(log(N)) steps. SecurityTLS, digest authentication, S/MIME Additionally needs a reputation system, working around spy nodes Maintenance, configuration Administrator: DNS, database, middle-box Automatic: one time bootstrap node addresses PSTN interoperability Gateways, TRIP, ENUMInteract with server-based infrastructure or co-locate peer node with the gateway
28 Outline of the presentation Introduction What is the problem? Why important? My contributions Server redundancy Load sharing and failover in SIP telephony Comparison of thread models for SIP server Peer-to-peer (P2P) SIP servers using external P2P network Additionally, P2P maintenance using SIP Enterprise IP telephony Multi-platform collaboration using SIP Scalable centralized conferencing architecture Interworking between SIP/SDP and H.323 Conclusion
29 Internal Telephone Extn: 7040 SIP/PSTN Gateway Department PBX Web based configuration Web server Telephone switch SQL database sipd: Proxy, redirect, Registrar server NetMeeting H.323 rtspd: media server sipum: Unified messaging Quicktime RTSP clients RTSP 713x CINEMA servers sipconf: Conference server siph323: SIP-H.323 translator Local/long distance PSTN Internet telephony infrastructure CINEMA: Columbia InterNet Extensible Multimedia Architecture SIP VXML vxml cgi My work Built many components in a complete system for enterprise IP telephony and multimedia collaboration
30 My other work Communication to collaboration Comprehensive, multi-platform collaboration using SIP Unified messaging: The gaps among different media (audio, video, text), devices (PC, phone) and means of communications ( , SIP, IM) disappear for messaging Novel SIP/RTSP based voic and answering machine SIP interface to VoiceXML browser Centralized conferencing Audio mixing, video forwarding, IM, shared web browsing, screen sharing, web-based configuration and control, floor control Performance evaluation; cascaded server architecture SIP-H.323 translation
31 Conference server Performance evaluation of audio mixer On commodity PC About 480 participants in a single conference with one active speaker About 80 four-party conferences, with one speaker each Both Pentium and Sparc took 6 MHz per participant
32 Conference server Cascaded architecture N.(N-1) participants Higher delay N 2 /4 participants Lower delay I measured the CPU usage for two cascaded servers: supports about 1000 participants SIP REFER message is used to create cascading
33 Outline of the presentation Introduction What is the problem? Why important? My contributions Server redundancy Load sharing and failover in SIP telephony Comparison of thread models for SIP server Peer-to-peer (P2P) SIP servers using external P2P network Additionally, P2P maintenance using SIP Enterprise IP telephony Multi-platform collaboration using SIP Scalable centralized conferencing architecture Interworking between SIP/SDP and H.323 Conclusion
34 Revisiting the problems Can SIP server provide carrier-grade reliability and scalability using commodity hardware? What affects the server performance? How can we build a server-less self- organizing peer-to-peer VoIP network? Can this be done in standards compliant way? Can communication be extended to multi- platform collaboration using existing protocols? How well multi-party conferencing scales? How to interoperate between SIP and H.323? Developed a two stage scalable and reliable SIP server architecture: linear scaling. Use event-based. Developed P2P-SIP architecture: SIP-using-P2P and P2P-over-SIP Multi-platform collaboration using existing protocols and tools, unified messaging, centralized conferencing (cascaded), SIP- H.323 interworking.
35 Conclusions Impact: Commercialized by SIPquest (now FirstHand) and sold to many customers. CINEMA was deployed in our department for a brief period of time. Used in various other projects at IRT: NG911, firewall controller, presence scalability, TCP/TLS measurements,… P2P-SIP is a “hot” topic in industry and IETF now – client desktop, hardware phone as well as server vendors are pursuing this. SIP-H.323 requirements eventually became an RFC Plan to open source SIPc for large scale deployment experience of P2P-SIP Started working on a P2P-based self organizing servers for 3GPP at Bell Labs “So what” (Implications): Replacing PSTN – better features, quality and performance at lower cost and maintenance; zero cost VoIP using P2P-SIP Distributed, multi-provider, component architecture for telephony and collaboration
36 My publications Conference, workshop, technical report, magazine/journal 1.K. Singh and H. Schulzrinne, “Using an external DHT as a SIP location service", Columbia University Technical Report CUCS , NY, Feb’06. 2.K. Singh and H. Schulzrinne, “Peer-to-peer Internet telephony using SIP", NOSSDAV, Skamania, Washington, Jun K. Singh and H. Schulzrinne, "Peer-to-peer Internet Telephony using SIP", New York Metro Area Networking Workshop, CUNY, NY, Sep K. Singh and H. Schulzrinne, "Peer-to-peer Internet Telephony using SIP", Columbia University Technical Report CUCS , NY, Oct K. Singh and H. Schulzrinne, “Failover, load sharing and server architecture in SIP telephony”, Elsevier Computer Communication Journal. To appear. Aug “K. Singh and H. Schulzrinne, “Failover and load sharing in SIP telephony", SPECTS (Symposium on performance evaluation of computer and telecommunication systems). Philadelphia, PA, Jul K. Singh and H. Schulzrinne, "Failover and Load Sharing in SIP Telephony", Columbia University Technical Report CUCS , NY, May H. Schulzrinne, K. Singh and X. Wu, "Programmable Conference Server", Columbia University Technical Report CUCS , NY, Oct K. Singh, Xiaotao Wu, J. Lennox and H. Schulzrinne, "Comprehensive Multi-platform Collaboration", MMCN SPIE Conference on Multimedia Computing and Networking, Santa Clara, CA, Jan K. Singh, Xiaotao Wu, J. Lennox and H. Schulzrinne, "Comprehensive Multi-platform Collaboration", Columbia University Technical Report CUCS , NY, Nov M. Buddhikot, A. Hari, K. Singh and S. Miller, "MobileNAT: A new Technique for Mobility across Heterogeneous Address Spaces", ACM MONET journal, March M. Buddhikot, A. Hari, K. Singh and S. Miller, "MobileNAT: A new Technique for Mobility across Heterogeneous Address Spaces", WMASH ACM International Workshop on Wireless Mobile Applications and Services on WLAN Hotspots, San Diego, CA, Sep K. Singh, A. Nambi and H. Schulzrinne, "Integrating VoiceXML with SIP services", ICC Global Services and Infrastructure for Next Generation Networks, Anchorage, Alaska, May K. Singh, A. Nambi and H. Schulzrinne, "Integrating VoiceXML with SIP services", Second New York Metro Area Networking Workshop, Columbia University, NY, Sep K. Singh, W. Jiang, J. Lennox, S. Narayanan and H. Schulzrinne, "CINEMA: Columbia InterNet Extensible Multimedia Architecture", Columbia University Technical Report CUCS , NY, May W. Jiang, J. Lennox, H. Schulzrinne and K. Singh, "Towards Junking the PBX: Deploying IP Telephony", NOSSDAV W. Jiang, J. Lennox, S. Narayanan, H. Schulzrinne, K. Singh and X. Wu, "Integrating Internet Telephony Services", IEEE Internet Computing (magazine), May/June 2002 (Vol. 6, No. 3). 9.K. Singh, Gautam Nair and H. Schulzrinne, "Centralized Conferencing using SIP", 2nd IP-Telephony Workshop (IPTel'2001), April K. Singh and H. Schulzrinne, "Unified Messaging using SIP and RTSP", IP Telecom Services Workshop 2000, Atlanta, Georgia, U.S.A, Sept K. Singh and H. Schulzrinne, "Unified Messaging using SIP and RTSP", Columbia University Technical Report CUCS , NY, Oct K. Singh, H.Schulzrinne, "Interworking Between SIP/SDP and H.323", 1st IP-Telephony Workshop (IPTel'2000), April K. Singh and H. Schulzrinne, "Interworking Between SIP/SDP and H.323", Columbia University Technical Report CUCS , NY, May 2000.