Reliable and Scalable Internet Telephony Kundan Singh and Henning Schulzrinne Internet Real Time Lab – Internal Talk Sept 24, 2004
2 Telephone reliability (PSTN: Public Switched Telephone Network) “bearer” network telephone switch (SSP) database (SCP) for freephone, calling card, … signaling network (SS7) signaling router (STP) local telephone switch (class 5 switch) 10,000 customers 20,000 calls/hour database (SCP) 10 million customers 2 million lookups/hour signaling router (STP) 1 million customers 1.5 million calls/hour regional telephone switch (class 4 switch) 100,000 customers 150,000 calls/hour
3 DB Internet telephony (SIP: Session Initiation Protocol) yahoo.comexample.com REGISTER INVITE DNS
4 SIP network architecture Scalability requirement depends on role GW MG IP network PSTN SIP/PSTN SIP/MGC Carrier network ISP Cybercafe IP PSTN GW PBX IP phones PSTN phones T1 PRI/BRI
5 Reliability and scalability for call routing, registration, conferencing, voic s Requirements Reliable Mean Time Between Failures (MTBF), Mean Time To Recover (MTTR) Scalable Registration rate, call rate, #requests/s Proposed solutions Server redundancy Apply existing web-redundancy designs Evaluate quantitatively (future work) Peer-to-peer Novel P2P-SIP architecture Evaluate quantitatively (future work)
6 Server redundancy The problem: failure or overload REGISTERINVITE
7 Server redundancy Replicate registration or search on call REGISTERINVITE REGISTERINVITE
8 Server redundancy Known techniques Client-based Cisco phones: primary and backup proxy DNS NAPTR, SRV IP address takeover Database redundancy...
9 High availability Failover in CINEMA Slave/ master Web scripts D2 P2 Master/ slave Web scripts D1 P1 phone.cs.columbia.edu sip2.cs.columbia.edu REGISTER proxy1 = phone.cs backup = sip2.cs _sip._udp SRV phone.cs.columbia.edu SRV sip2.cs.columbia.edu replication
10 High availability Time to recover Client re-sends INVITE to P2 Immediately on ICMP error Or after 10s otherwise sipd has in-memory cache Refresh registration much before expiry Registrations are additive Measurement of recovery time Optimal #servers
11 Scalability Load sharing: redundant proxies and databases REGISTER Write to D1 & D2 INVITE Read from D1 or D2 Database write/ synchronization traffic becomes bottleneck D1 D2 P1 P2 P3 REGISTER INVITE
12 Scalability Load sharing: divide the user space Proxy and database on the same host Stateless proxy can become overloaded Hashing Static vs dynamic D1 D2 P1 P2 P3 D3 a-h i-q r-z
13 Scalability Comparison of the two designs ((tr/D)+1)TN = (A/D) + B ((tr+1)/D)TN = (A/D) + (B/D) D1 D2 P1 P2 P3 D1 D2 P1 P2 P3 D2 a-h i-q r-z Total time per DB D = number of database servers N = number of writes (REGISTER) r = #reads/#writes = (INV+REG)/REG ~ 2 T = write latency t = read latency/write latency
14 Reliability and scalability Two stage architecture for CINEMA MasterSlaveMasterSlave s1 s2 s3 a1 a2 b1 b2 example.com _sip._udp SRV 0 0 s1.example.com SRV 0 0 s2.example.com SRV 0 0 s3.example.com SRV 1 0 ex.backup.com a.example.com _sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com b.example.com _sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com Request-rate = f(#stateless, #groups) Bottleneck: CPU, memory, bandwidth? Failover latency: ?
15 Server-based vs peer-to-peer Server-based Cost: maintenance, configuration Central points of failures Controlled infrastructure (e.g., DNS) Peer-to-peer Robust: no central dependency Self organizing, no configuration Scalability ? C C C C C S P P P P P
16 Related work: Skype From the KaZaA community Host cache of some super nodes Bootstrap IP addresses Auto-detect NAT/firewall settings STUN and TURN Protocol among super nodes – ?? Allows searching a user (e.g., kun*) History of known buddies All communication is encrypted Promote to super node Based on availability, capacity Conferencing P P P P P P P P P PPP
17 We propose: P2P-SIP Unlike server-based SIP architecture Unlike proprietary Skype architecture Robust and efficient lookup using DHT Interoperability DHT algorithm uses SIP communication Hybrid architecture Lookup in SIP+P2P Unlike file-sharing applications Data storage, caching, delay, reliability Disadvantages Lookup delay and security
18 P2P-SIP Background: DHT (Chord) Identifier circle Keys assigned to successor Evenly distributed keys and nodes Finger table: logN i th finger points to first node that succeeds n by at least 2 i-1 Stabilization for join/leave Keynode 8+1 = = = = = =4042
19 P2P-SIP Design Alternatives 65a1fc d13da3 d4213f d462ba d467c4 d471f1 d46a1c Route(d46a1c) Use DHT in server farm Use DHT for all clients; But some are resource limited Use DHT among super-nodes 1. Hierarchy 2. Dynamically adapt servers clients
20 P2P-SIP Node architecture: registrar, proxy, user agent DHT communication using SIP REGISTER Known node: Unknown node: User: User interface (buddy list, etc.)SIPICERTP/RTCPCodecsAudio devicesDHT (Chord) On startup DiscoverUser location Multicast REGPeer found/ Detect NAT REG REG, INVITE, MESSAGE Signup, Find buddies Join Find Leave On reset Signout, transfer IM, call
21 P2P-SIP Node Startup SIP REGISTER with SIP registrar DHT Discover peers: multicast REGISTER Join DHT using node-key=Hash(ip) REGISTER with DHT using user- Dialing out Call, instant message, etc. INVITE MESSAGE Last seen, SIP NAPTR/SRV, DHT REGISTER DB sipd Detect peers columbia.edu REGISTER alice=42 REGISTER bob=12
22 P2P-SIP Node Leaves Graceful leave Un-REGISTER Transfer registrations Failure Attached nodes detect and re-REGISTER New REGISTER goes to new super-nodes Super-nodes adjust DHT accordingly DHT REGISTER key=42 OPTIONS 42 REGISTER
23 P2P-SIP Implementation sippeer : C++, Unix (Linux), Chord Node join and form the DHT Node failure is detected and DHT updated Registrations transferred on node shutdown Co-located sipc can use sippeer service
24 P2P-SIP Evaluation #super-nodes needed depends on Registration refresh rate, replication Join/leave rate, uptime Call arrival rate CPU, memory, bandwidth limits Other metrics Call setup latency Recovery time after super-node failure
25 P2P-SIP Advanced services and open issues Offline messages INVITE or MESSAGE fails => Responsible node stores voic , instant message. Conferencing Mixer, full mesh, multicast Open issues P2P reputation system Motivation to become super node Security (SPAM, DOS, spy, …)...
26 Server-based vs peer-to-peer Server-based vs peer-to-peer Reliability, failover latency DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval. Scalability, number of users Depends on number of servers in the two stages. Depends on refresh rate, join/leave rate, uptime Call setup latency One or two steps.O(log(N)) steps. SecurityTLS, digest authentication, S/MIME Additionally needs a reputation system, working around spy nodes Maintenance, configuration Administrator: DNS, database, middle-box Automatic: one time bootstrap node addresses PSTN interoperability Gateways, TRIP, ENUMInteract with server-based infrastructure or co-locate peer node with the gateway
27 Summary Motivation PSTN is reliable and scalable Can IP telephony do better? Server-based DNS, stateless, DB replication, two stage Peer-to-peer SIP, DHT, soft state, self organizing
28 Internal Telephone Extn: 7040 SIP/PSTN Gateway Department PBX Web based configuration Web server Telephone switch SQL database sipd: Proxy, redirect, Registrar server NetMeeting H.323 rtspd: media server sipum: Unified messaging Quicktime RTSP clients RTSP 713x CINEMA servers sipconf: Conference server siph323: SIP-H.323 translator Local/long distance PSTN Beyond proxy/registrar CINEMA: Columbia InterNet Extensible Multimedia Architecture SIP VXML vxml cgi
29 Communication to collaboration Synchronous (tightly coupled) Video conference, IM, screen sharing, … Asynchronous (loosely coupled) File sharing, message board, … Messaging and notifications Personalized view Per-user calendar, access control, address book Goal: provide personalized access, alternate between synchronous and asynchronous communication, and access from different devices and clients.