Peer-to-peer systems for autonomic VoIP and web hotspot handling

Peer-to-peer systems for autonomic VoIP and web hotspot handling
Kundan Singh, Weibin Zhao and Henning Schulzrinne Internet Real Time Laboratory Computer Science Dept., Columbia University, New York

P2P for autonomic computing
Autonomic at the application layer: Robust against partial network faults Resources grow as user population grows Self-configuring Traditional p2p systems file storage motivation is often legal, not technical, efficiency usually unstructured, optimized for Zipf-like popularity Other p2p applications: Skype demonstrates usefulness for VoIP  identifier lookup NAT traversal: media traversal OpenDHT (and similar) as emerging common infrastructure? Non-DHT systems with smaller scope  web hotspot rescue Network management (see our IRTF slides) IBM Delhi (Jan. 2006)

Aside: middle services instead of middleware
Common & successful network services identifier lookup: ARP, DNS network storage: proprietary (Yahoo, .mac, …) storage + computation: CDNs Emerging network services peer-to-peer identifier lookup network storage network computation (“utility”) maybe programmable already found as web hosts and grid computing IBM Delhi (Jan. 2006)

What is P2P? Share the resources of individual peers
CPU, disk, bandwidth, information, … Computer systems Centralized Distributed Client-server Peer-to-peer Flat Hierarchical Pure Hybrid mainframes workstations DNS mount RPC HTTP Gnutella Chord Napster Groove Kazaa File sharing Communication and collaboration Distributed computing Napster Gnutella Kazaa Freenet Overnet Magi Groove Skype Another category in the system is the infrastructure component, which defines what routing mechanism to use: Chord, CAN, etc. What are Groove and Magi? Groove: P2P system for office collaboration. Not clear how the user is detected, perhaps statically added by peers. Every peer in the group sees the same user space. Synchronized across all peers. All communications are signed. Proprietary. Magi: goal was to make it standards based such as HTTP, XML, WebDAV to provide document sharing and communication mechanism on the Internet. It uses dynamic DNS to register the Magi nodes/instances. All communications are signed. C S P IBM Delhi (Jan. 2006)

Distributed Hash Table (DHT)
Types of search Central index (Napster) Distributed index with flooding (Gnutella) Distributed index with hashing (Chord, Bamboo, …) Basic operations find(key), insert(key, value), delete(key), but no search(*) Properties/types Every peer has complete table Chord Every peer has one key/value Search time or messages O(1) O(log(N)) O(n) Join/leave messages O(log(N)2) IBM Delhi (Jan. 2006)

CAN Content Addressable Network
Each key maps to one point in the d-dimensional space Each node responsible for all the keys in its zone. Divide the space into zones. 1.0 C D E A B 0.0 0.0 1.0 C D E A B IBM Delhi (Jan. 2006)

CAN State = 2d Search = dxn1/d Node Z joins
1.0 .75 .5 .25 0.0 E A E A B X C B X Z C D D (x,y) Fault tolerant: know your neighbor’s neighbors. When node fails, one of the neighbor takes over. If adjacent nodes fail at the same time, use flooding to discover topology Node removal: Use heartbeat recover structure and repair routing (background zone reassignment) expanding ring search to build neighbor information before repairing simultaneous failures When joining, use the neighbor which is least loaded Node Z joins Node X locates (x,y)=(.3,.1) State = 2d Search = dxn1/d IBM Delhi (Jan. 2006)

Chord Identifier circle Keys assigned to successor
Evenly distributed keys and nodes 1 54 8 58 10 14 47 21 Lookup is based on skiplist. Search in O(logN). State is O(logN). 42 38 32 38 24 30 IBM Delhi (Jan. 2006)

Chord Finger table: logN
Key node 8+1 = 9 14 8+2 = 10 8+4 = 12 8+8 = 16 21 8+16=24 32 8+32=40 42 1 54 8 58 10 14 47 21 Finger table: logN ith finger points to first node that succeeds n by at least 2i-1 Stabilization after join/leave Lookup in finger table for the furthest node that precedes the key In a system with N nodes and K keys, with high probability, each node receives at most K/N keys each node maintains information about O(logN) other nodes lookups resolved with O(logN) hops No network locality. Replicas need to have explicit consistency. Optimizations: location: weight neighbor nodes by RTT. When routing choose the neighbor who is closer to destination with lowest RTT from me. Reduce path latency. Multiple physical nodes per virtual node. What if a node leaves? 42 38 32 38 24 30 IBM Delhi (Jan. 2006)

Tapestry ID with base B=2b
Route to numerically closest node to the given key Routing table has O(B) columns. One per digit in node ID Similar to CIDR – but suffix-based 427 763 364 123 324 365 135 564 **4 => *64 => 364 N=2 N=1 N=0 064 ?04 ??0 164 ?14 ??1 264 ?24 ??2 364 ?34 ??3 464 ?44 ??4 564 ?54 ??5 664 ?64 ??6 IBM Delhi (Jan. 2006)

Pastry Prefix-based d471f1
Route to node with shared prefix (with the key) of ID at least one digit more than this node Neighbor set, leaf set and routing table d471f1 d467c4 d46a1c d462ba d4213f For concurrent failures there is a |L| proximity vector. So delivery is guaranteed unless more than |L|/2 nodes with adjacent ids fail. There is a neighbor set maintained |M| physically close nodes. If range in |L| then use numerically closest node ID. Find node id in table with largest prefix – larger than ID and this node. Find in |L| with same prefix length but numerically closer to ID Route(d46a1c) d13da3 65a1fc IBM Delhi (Jan. 2006)

Other schemes Distributed TRIE Viceroy Kademlia SkipGraph Symphony …
Distributed trie: no leave. Initial join. Insert. Lookup. Each node has l entries. 2^m tables per trie node. Each entry has peer address a and time stamp t. Leaf means peer a held the value at t. Entry at ith tabe, indicates a held this child node at t. If a peer holds a node, it must hold all ancestors. Attack registant since no fixed node per key. Good for frequently accessed keys. If stale views then similar to broadcast. Viceroy: Based on butterfly network. Binary search tree at each node. Node is in one of the different logn levels. Kademlia: XOR distance – symmetric. Interval of nodes instead of single node per level. Lookup path converges to same node – caching possible. Skipnet/Skipgraph: utilizes locality since not on numeric ID, but sorted keys. Bad for node high node failure probability. Search may fail on node failures. Symphony: similar to chord, but range of nodes. Use randomization. IBM Delhi (Jan. 2006)

DHT Comparison Property/ scheme Un-structured CAN Chord Tapestry
Pastry Viceroy Routing O(N) or no guarantee d x N1/d log(N) logBN State Constant 2d B.logBN Join/leave (logN)2 Reliability and fault resilience Data at Multiple locations; Retry on failure; finding popular content is efficient Multiple peers for each data item; retry on failure; multiple paths to destination Replicate data on consecutive peers; retry on failure Replicate data on multiple peers; keep multiple paths to each peers Routing load is evenly distributed among participant lookup servers IBM Delhi (Jan. 2006)

Server-based vs peer-to-peer
Reliability, failover latency DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval. Scalability, number of users Depends on number of servers in the two stages. Depends on refresh rate, join/leave rate, uptime Call setup latency One or two steps. O(log(N)) steps. Security TLS, digest authentication, S/MIME Additionally needs a reputation system, working around spy nodes Maintenance, configuration Administrator: DNS, database, middle-box Automatic: one time bootstrap node addresses PSTN interoperability Gateways, TRIP, ENUM Interact with server-based infrastructure or co-locate peer node with the gateway IBM Delhi (Jan. 2006)

The basic SIP service HTTP: retrieve resource identified by URI
SIP: translate address-of-record SIP URI to one or more contacts (hosts or other AORs, e.g., single user  multiple hosts e.g., home, office, mobile, secretary can be equal or ordered sequentially Thus, SIP is (also) a binding protocol similar, in spirit, to mobile IP except application layer and without some of the related issues Function performed by SIP proxy for AOR’s domain delegated logically to location server This function is being replaced by p2p approaches IBM Delhi (Jan. 2006)

What is SIP? Why P2P-SIP? (1) REGISTER => (2) INVITE (3) Contact: Alice’s host Bob’s host columbia.edu Problem in client-server: maintenance, configuration, controlled infrastructure Peer-to-peer network Alice (1) REGISTER (2) INVITE alice (3) No central server, but more lookup latency Replace central server to a P2P network What we gain: Reliability, scalability. What we lose: bounds on INVITE. Search feature. IBM Delhi (Jan. 2006)

How to combine SIP + P2P? P2P-over-SIP SIP-using-P2P
Additionally, implement P2P using SIP messaging SIP-using-P2P Replace SIP location service by a P2P protocol SIP-using-P2P P2P SIP proxies P2P-over-SIP Maintenance P2P SIP Lookup SIP-using-P2P: Reuse optimized and well-defined external P2P network Define P2P location service interface to be used in SIP Extends to other signaling protocols (H.323) Don’t overload SIP or REGISTER Lookup is separate from call signaling P2P-over-SIP No change in semantics of SIP No dependence on external P2P network Reuse existing features such as forking (for voice mail) Built-in NAT/media relays. Additional message overhead due to SIP. P2P network REGISTER FIND INVITE alice INSERT P2P-SIP overlay Alice INVITE Alice IBM Delhi (Jan. 2006)

Design alternatives Use DHT in server farm
1 8 14 21 32 38 58 47 10 24 30 54 42 65a1fc d13da3 d4213f d462ba d467c4 d471f1 d46a1c Route(d46a1c) servers 1 54 10 38 24 30 clients Use DHT in server farm Use DHT for all clients - but some are resource limited Use DHT among super-nodes Hierarchy Dynamically adapt IBM Delhi (Jan. 2006)

Deployment scenarios Interoperate among these! P2P proxies
P2P database P2P clients There are three components in client-server SIP architecture: user agents, proxies, and data bases. P2P network can be formed in any of these. They had tradeoffs in terms of ease of deployment, ease of integration with existing SIP clients and proxies, and reusability with other protocols and applications. Different scenarios have different trust models! Plug and play; May use adaptors; Untrusted peers Zero-conf server farm; Trusted servers and user identities Global, e.g., OpenDHT; Clients or proxies can use; Trusted deployed peers Interoperate among these! IBM Delhi (Jan. 2006)

Hybrid architecture Cross register, or Locate during call setup
DNS, or P2P-SIP hierarchy To honor administrative boundaries and incremental deployment, need to provide interoperation among multiple P2P networks and with client-server SIP architecture. Cross register the user registrations from one P2P network to the other has problem if lots of networks, since every network needs to store all registrations from other networks. Locate the destination user in another network during call setup Either the global lookup of domain can be done using DNS, or Use P2P-SIP hierarchy (e.g., global P2P SIP network for domain lookups) lookup latency still O(logN) IBM Delhi (Jan. 2006)

What else can be P2P? Rendezvous/signaling (SIP) Configuration storage
Media storage (e.g., voice mail) Identity assertion (?) PSTN gateway (?) NAT/media relay (find best one) P2P storage of configuration such as user profile, encrypted password/key P2P storage with redundancy/replication and message waiting notifications for voice mails P2P node and user identity verification instead of replying on central CAs. This in turn helps in malicious node detection and identification. Locating best cost PSTN gateway for particular number on P2P network. Locating best media relay on P2P network: start of call vs during the call (if old media relay node leaves). Trust models are different for different components! IBM Delhi (Jan. 2006)

What is our P2P-SIP? Unlike server-based SIP architecture
Unlike proprietary Skype architecture Robust and efficient lookup using DHT Interoperability DHT algorithm uses SIP communication Hybrid architecture Lookup in SIP+P2P Unlike file-sharing applications Data storage, caching, delay, reliability Disadvantages Lookup delay and security IBM Delhi (Jan. 2006)

Implementation: SIPpeer
Platform: Unix (Linux), C++ Modes: Chord: using SIP for P2P maintenance OpenDHT: using external P2P data storage based on Bamboo DHT, running on PlanetLab nodes Scenarios: P2P client, P2P proxies Adaptor for existing phones Cisco, X-lite, Windows Messenger, SIPc Server farm Chord: Join, leave, failure, lookup, ordinary node vs super-node, node naming (URI), authentication ( ), maintenance message (REGISTER) OpenDHT: connect to one or more OpenDHT server (dynamically refresh, if server leaves), signing and verification of contacts. IBM Delhi (Jan. 2006)

P2P-SIP: identifier lookup
P2P serves as SIP location server: address-of-record  contacts e.g.,  , multi-valued: (keyn, value1), (keyn, value2) with limited TTL variant: point to SIP proxy server either operated by supernode or traditional server allows registration of non-p2p SIP domains easier to provide call routing services (e.g., CPL) alice  alice  IBM Delhi (Jan. 2006)

Background: DHT (Chord)
Identifier circle Keys assigned to successor Evenly distributed keys and nodes Finger table: logN ith finger points to first node that succeeds n by at least 2i-1 Stabilization for join/leave Key node 8+1 = 9 14 8+2 = 10 8+4 = 12 8+8 = 16 21 8+16=24 32 8+32=40 42 1 54 8 58 10 14 47 21 Lookup in finger table for the furthest node that precedes the key In a system with N nodes and K keys, with high probability, each node receives at most K/N keys each node maintains information about O(logN) other nodes lookups resolved with O(logN) hops No network locality. Replicas need to have explicit consistency. Optimizations: location: weight neighbor nodes by RTT. When routing choose the neighbor who is closer to destination with lowest RTT from me. Reduce path latency. Multiple physical nodes per virtual node. What if a node leaves? 42 38 32 38 24 30 IBM Delhi (Jan. 2006)

Implementation: SIPpeer
User interface (buddy list, etc.) Signup, Find buddies IM, call On reset Signout, transfer On startup User location Leave Find Discover Join Audio devices DHT (Chord) REGISTER, INVITE, MESSAGE This is just an example architecture. Codecs Peer found/ Detect NAT Multicast REGISTER REGISTER ICE SIP RTP/RTCP SIP-over-P2P P2P-using-SIP IBM Delhi (Jan. 2006)

P2P vs. server-based SIP Prediction: P2P for smaller & quick setup scenarios Server-based for corporate and carrier Need federated system multiple p2p systems, identified by DNS domain name with gateway nodes 2000 requests/second ≈7 million registered users IBM Delhi (Jan. 2006)

Open issues Presence and IM
where to store presence information: need access authorization Performance how many supernodes are needed? (Skype: ~1000) Reliability P2P nodes generally replicate data if proxy or presence agent at leaf, need proxy data replication Security Sybil attacks: blackholing supernodes Identifier protection: protect first registrant against identity theft Anonymity, encryption Protecting voic s on storage nodes Optimization Locality, proximity, media routing Deployment SIP-P2P vs P2P-SIP, Intra-net, ISP servers Motivation Why should I run as super-node? IBM Delhi (Jan. 2006)

Comparison of P2P and server-based systems
scaling server count  scales with user count, but limited by supernode count efficiency most efficient DHT maintenance = O((log N)2) security trust server provider; binary trust most supernodes; probabilistic reliability server redundancy; catastrophic failure possible unreliable supernodes; catastrophic failure unlikely IBM Delhi (Jan. 2006)

Using P2P for binding updates
Proxies do more than just plain identifier translation: translation may depend on who’s asking, time of day, … e.g., based on script output hide full range of contacts from caller sequential and parallel forking disconnected services: e.g., forward to voic if no answer Using a DHT as a location service  use only plain translation run services on end systems run proxy services on supernode(s) and use proxy as contact  need replication for reliability Skype approach IBM Delhi (Jan. 2006)

Reliability and scalability Two stage architecture for CINEMA
a1 Master a.example.com _sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com s1 a2 Slave s2 Master b.example.com _sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com s3 b1 One group can become backup for other group. Slave example.com _sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com ex b2 Request-rate = f(#stateless, #groups) Bottleneck: CPU, memory, bandwidth? Failover latency: ? IBM Delhi (Jan. 2006)

SIP p2p summary http://www.p2psip.org and
Advantages Out-of-box experience Robust catastrophic failure-unlikely Inherently scalable more with more nodes Status IETF involvement Columbia SIPpeer Security issues Trust, reputation malicious node, sybil attack SPAM, DDoS Privacy, anonymity (?) Other issues Lookup latency,proximity P2P-SIP vs SIP-using-P2P Why should I run as super-node? and IBM Delhi (Jan. 2006)

DotSlash: An Automated Web Hotspot Rescue System
Weibin Zhao Henning Schulzrinne DotSlash: An Automated Web Hotspot Rescue System

The problem Web hotspots
Also known as flash crowds or the Slashdot effect Short-term dramatic load spikes at web servers Existing mechanisms are not sufficient Over-provisioning Inefficient for rare events Difficult because the peak load is hard to predict CDNs Expensive for small web sites that experience the Slashdot effect IBM Delhi (Jan. 2006)

The challenges Automate hotspot handling
Eliminate human intervention to react quickly Improve availability during critical periods (“15 minutes of fame”) Allocate resources dynamically Static configuration is insufficient for unexpected dramatic load spikes Address different bottlenecks Access network, web server, application server, and database server IBM Delhi (Jan. 2006)

Our approach DotSlash An automated web hotspot rescue system by building an adaptive distributed web server system on the fly Advantages Fully self-configuring – no configuration Service discovery, adaptive control, dynamic virtual hosting Scalable, easy to use Works for static & LAMP applications handles network, CPU and database server bottlenecks Transparent to clients cf. CoralCache IBM Delhi (Jan. 2006)

DotSlash overview Rescue model
Mutual aid community using spare capacity Potential usage by web hosting companies DotSlash components Workload monitoring Rescue server discovery Load migration (request redirection) Dynamic virtual hosting Adaptive rescue and overload control IBM Delhi (Jan. 2006)

Handling load spikes Request redirection DNS-RR: reduce arrival rate
HTTP redirect: increase service rate Handle different bottlenecks Technique Bottleneck Addressed Cache static content Network, web server Replicate scripts dynamically Application server Cache query results on demand Database server IBM Delhi (Jan. 2006)

Rescue example Cache static content client1 origin server
(2) HTTP redirect (4) (3) (1) reverse proxy origin server rescue server (3) (4) (1) client2 DNS server (2) DNS round robin IBM Delhi (Jan. 2006)

Rescue example (2) Replicate scripts dynamically origin server
Apache origin server PHP database server (1) (6) PHP client (2) (5) PHP (4) (3) (7) rescue server (8) MySQL Apache IBM Delhi (Jan. 2006)

Rescue example (3) Cache query results on demand database origin
server query result cache origin server client data driver database server query result cache rescue server data driver IBM Delhi (Jan. 2006)

Allocate rescue server
Server states Origin server Get help from others SOS state Allocate rescue server Release all rescues Normal state Accept SOS request Shutdown all rescues Rescue server Provide help to others Rescue state IBM Delhi (Jan. 2006)

Handling load spikes Load migration DNS-RR: reduce arrival rate
HTTP redirect: increase service rate Both: increase throughput Benefits Reduce origin server network load by caching static content at rescue servers Reduce origin web server CPU load by replicating scripts dynamically to rescue servers IBM Delhi (Jan. 2006)

Adaptive overload control
Objective CPU and network load in desired load region Origin server Allocate/release rescue servers Adjust redirect probability Rescue server Accept SOS requests Shutdown rescues Adjust allowed redirect rate IBM Delhi (Jan. 2006)

Self-configuring Rescue server discovery via SLP and DNS SRV
Dynamic virtual hosting: Serving content of a new site on the fly use “pre-positioned” Apache virtual hosts Workload monitoring: network and CPU take headers and responses into account Adaptive rescue control Don’t know precise load handling capacity of rescue servers particularly for active content Establish desired load region (typically, ~70%) Periodically measure and adjust redirect probability convey via protocol IBM Delhi (Jan. 2006)

Implementation Based on LAMP (Linux, Apache, MySQL, PHP)
Apache module (mod_dots), DotSlash daemon (dotsd), DotSlash rescue protocol (DSRP) Dynamic DNS using BIND with dot-slash.net Service discovery using enhanced SLP Apache mod_dots SHM dotsd other dotsd DSRP HTTP client SLP DNS BIND mSLP IBM Delhi (Jan. 2006)

Handling File Inclusions
The problem A replicated script may include files that are located at the origin server Assume: included files under DocumentRoot Approaches Renaming inclusion statements Need to parse scripts: heavy weight Customized error handler Catch inclusion errors: light weight IBM Delhi (Jan. 2006)

Evaluation Workload generation httperf for static content
RUBBoS (bulletin board) for dynamic content Testbed LAN cluster and WAN (PlanetLab) nodes Linux Redhat 9.0, Apache , MySQL , PHP 4.3.6 Metrics Max request rate and max data rate supported IBM Delhi (Jan. 2006)

Results in LANs Request rate, redirect rate, rescue rate Date rate
IBM Delhi (Jan. 2006)

Handling worst-case workload
Settling time: 24 second #timeouts 921/113565 IBM Delhi (Jan. 2006)

Results for dynamic content
Configuration: Rescue (LC) Rescue (LC) Rescue (LC) Rescue (LC) Rescue (LC) Rescue (LC) Origin (HC) Rescue (LC) DB (HC) Rescue (LC) Rescue (LC) No rescue: R=118 CPU: Origin=100% DB=45% With rescue: R=245 #rescue servers: 9 CPU: Origin=55% DB=100% 245/118>2 IBM Delhi (Jan. 2006)

Caching TTL and Hit Ratio (Read-Only)
IBM Delhi (Jan. 2006)

CPU Utilization (Read-Only)
with rescue no cache READ4 with rescue with co-located cache READ5 with rescue with shared cache IBM Delhi (Jan. 2006)

Request Rate (Read-Only)
with rescue no cache READ4 with rescue with co-located cache READ5 with rescue with shared cache IBM Delhi (Jan. 2006)

CPU Utilization (Submission)
with rescue no cache SUB5 with rescue with cache no invalidation SUB6 with rescue with cache with invalidation IBM Delhi (Jan. 2006)

Request Rate (Submission)
with rescue no cache SUB5 with rescue with cache no invalidation SUB6 with rescue with cache with invalidation IBM Delhi (Jan. 2006)

Performance Static content (httperf) 10-fold improvement
Relieve network and web server bottlenecks Dynamic content (RUBBoS) Completely remove web/application server bottleneck Relieve database server bottleneck Overall improvement: 10 times for read-only mix, 5 times for submission mix IBM Delhi (Jan. 2006)

Conclusion DotSlash prototype
Applicable to both static and dynamic content Promising performance improvement Released as open-source software On-going work Address security issues in deployment Extensible to SIP servers? Web services? For further information DotSlash framework: WCW 2004 Dynamic script replication: Global Internet 2005 On-demand query result cache: TR CUCS (under submission) IBM Delhi (Jan. 2006)

Peer-to-peer systems for autonomic VoIP and web hotspot handling

Similar presentations

Presentation on theme: "Peer-to-peer systems for autonomic VoIP and web hotspot handling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Peer-to-peer systems for autonomic VoIP and web hotspot handling

Similar presentations

Presentation on theme: "Peer-to-peer systems for autonomic VoIP and web hotspot handling"— Presentation transcript:

Similar presentations

About project

Feedback