Peer-to-Peer Communication Systems Protocols and Systems, Reliability, Energy Efficiency and Measurements Salman Abdul Baset Department of Computer Science Columbia University
2 Background and Motivation
3 IP-based communication systems Basic services –establish voice, video, IM sessions –voic Advanced services –conferencing, telepresence –voic to text Client-serverPeer-to-Peer
4 Client-server IP communication system SIP registrar / proxy server REGISTER (ip addr) REGISTER (ip addr) User agent (1) signaling (2) media (voice, video, IM) SIP registrar / proxy / presence server Utopian Internet No NATs or firewalls IP-PSTN gateway
5 Client-server IP communication system SIP registrar / proxy / presence / server User agent media server NAT / firewall Network NAT Src-IP Dst-IP Pub-IP Src-IPDst-IPPub-IPDst-IP Pr-IP packet
6 Client-server IP communication system SIP registrar / proxy / presence / server User agents (1) signaling (2) media (voice, video, IM) (UDP or TCP) media server NAT / firewall IP-PSTN gateway What is centralized? directory service call signaling media session and conferencing presence PSTN connectivity Scaling for millions of users –servers –b/w –management overhead { Peer-to-Peer distribute to user agents Why is this a problem ? How many calls need media relaying? –30% –in practice: all
7 Peer-to-peer communication system P2P / PSTN gateway NAT / firewall network address of node B? (3) signaling (4) media network address of node E? (2) signaling (3) media (TCP) node C node B media relay (or relay) node A node D node E (1) (2) node = user agent nodes form an overlay share responsibilities for message routing, signaling, media relaying super nodes, ordinary nodes (1) (2) (1)
8 Challenges Designing, building, and analyzing p2p communication systems #1 Protocol and system design #2 Reliability #3 Session quality #4 Energy efficiency #5 Measurement NAT / firewall network address of node E? (2) signaling (3) media (TCP) node C node B media relay (or relay) node A node D node E (1)
9 Why not just use Skype? Skype works, but Closed and proprietary solution Requires Internet access –cannot be used in ad hoc environments Skype network failure for 2-5 days –August 2007
10 Motivation Peer-to-peer communication systems –Why not client-server? server, b/w, maintenance overhead –Why not just use Skype? proprietary solution
11 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems Energy Efficiency Measurement What is the reliability of a p2p comm. system? How to design protocols and systems for p2p. communication? What are the measurement techniques to understand p2p comm. systems as a black box? Outline Reliability Other work Are p2p VoIP systems more energy efficient than c/s?
12 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems How to design protocols and systems for p2p. communication? Outline Reliability
13 Protocol and System Design Goal: design open, standardized, and interoperable protocols for building p2p communication systems in ad hoc, office, and Internet environments High-level Requirements –Scalability NATs and firewalls churn heterogeneous capabilities overlay routing –Security trusted and insecure environments –Resources and Services heterogeneity, discovery, addressing –Interoperability –Reuse existing protocols where possible Can we meet these requirements?
14 Yes, we can! How? (1)Identify common aspects of existing p2p protocols and potential deployments and incorporate them in the protocol. (2)Support pluggable overlay routing. one overlay protocol may not be suitable for all environments (3)Make protocol extensible for future-proofing. flexibility vs. complexity tradeoff
15 Data model –addressing, storage, integrity Message reliability –hop-by-hop, e2e Non-common aspects Next-hop determination –depends on the overlay protocol –Chord, Kademla, Gia, Protocol and System Design Common aspects Connectivity –NAT traversal –bootstrap Resilience –recovery from node churn Request routing –recursive vs. iterative –parallel vs. sequential Heterogeneity of nodes –mobile, desktop –super node vs. ordinary node Security –identity –message confidentiality Methods for implementing the common aspects Overlay protocol implements specific methods
16 Peer-to-Peer Protocol (P2PP) Now part of RELOAD protocol being standardized in the IETF Not a new DHT! Geared towards IP telephony but applicable to streaming, VoD etc. A request / response binary protocol –Common methods Join, Leave, Publish, Lookup etc –Overlay specific FindPeer, ExchangeTable Pluggable overlay routing (Chord, Kademlia etc) Application-level API Security –enrollment server, shared-secret, X.509 certificates –TLS, DTLS for message confidentiality IETF P2PSIP working group documents SIP P2PPICE TLS / SSL protocol stack of a node API
17 Peer-to-Peer Protocol (P2PP) Node heterogeneity –peers (super nodes) and clients (ordinary nodes) –use of peers as relays NAT traversal built-in –a node exchanges its host, NAT, and a relay IP address in requests and responses –then uses ICE (interactive connectivity establishment) for NAT traversal Message reliability –hop-by-hop, e2e Data model –key / value pairs –value: single, array, dictionary –data integrity Monitoring and diagnostics gathering
18 OpenVoIP architecture SIP P2PPICE Transport A peer NAT A client [ Bootstrap / authentication ] Overlay1 Overlay2 Protocol stack of a peer NAT [ monitoring server / Google Maps ] Proof-of-concept system based on P2PP SIGCOMM (demo) 2008
19 OpenVoIP: key facts 1000 node network on ~500 PlanetLab machines Kademlia, Bamboo, Chord Windows XP / Vista, Linux Integrated with Google, flash-based maps Integrated with open source SIP phone OpenWengo (Qutecom) Code used and modified by Ericsson Labs, Nokia Labs, Telecom Italia, and many universities
20 OpenVoIP: geo+logical interface
21 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems What is the reliability of a p2p comm. system? Outline Reliability
22 Reliability of P2P Comm. Systems Goal: to reason about the reliability of p2p comm. systems Reliability=Proportion of completed calls –understand reasons for call failure –devise techniques to improve them Reasons for call failure –(1) distributed search fails to find online callee –(2) distributed search fails to find a suitable relay –(3) relay fails during voice/video session understand and improve reliability for relayed calls IPTCOMM’2010 NAT / firewall network address of node E? (2) signaling (3) media (TCP) node C node B media relay (or relay) node A node D node E (1)
23 Understanding reliability of relayed calls For desired reliability, minimum relays k per call? Model –when i th relay fails, call is switched (i+1) st relay which is instantly selected from the global pool of all relays. –R i residual lifetime of a relay candidate (i.i.d.) –let D denote the call duration. k depends on the relationship b/w node lifetime and call duration 99.9% R1R1 RkRk R k-1 D 12k-1 k
24 Understanding reliability of relayed calls Min # of relays k Min # of relays k Skype 12 hours (mean) 4 hours (med) 3 (mean call holding time = one hour) 95% of Skype call durations – minimum of 3 relays to maintain 99.9% success rate 95% of Skype relay calls last less than 60 mins Exponential node lifetimes Skype node lifetimes lifetimes approximated as pareto Mean node lifetime Mean call duration What if the system does not have enough relays?
25 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems Energy Efficiency Are p2p VoIP systems energy efficient than c/s? Outline Reliability
26 Are p2p comm. systems more energy efficient than c/s? Two reasons –overheads (e.g., cooling) power utilization efficiency – (PUE) ratio of data center power draw to IT power draw –idle power consumption –But really? we tried to answer this question SIP registrar / proxy / presence / server User agents INVITE media (voice, video, IM) (UDP or TCP) media server NAT / firewall SIGCOMM Green networking workshop 2010 NAT / firewall network address of node E? (2) signaling (3) media (TCP) node C node B media relay (or relay) node A node D node E (1)
27 Are p2p comm. systems more energy efficient than c/s? Issues in comparison –under same load –centralized vs. distributed aspects do not compare components that are centralized in p2p and c/s –PSTN replacement Skype vs. Vonage –endpoint energy consumption negligible 5W per device, but millions of them … –workload characteristics impacted by NATs and firewalls Our approach –gather peak data from VoIP providers –build energy models for c/s and p2p VoIP systems –perform measurements C/S VoIP provider –100 K users mostly business –15 calls per second (CPS) –~5K calls in system –NAT keep-alive traffic –all calls are relayed
28 Energy Models for C/S and P2P N users or nodes C/S model –C/S power consumption = servers x watt/server x redundancy factor x PUE P2P model –S supernodes –p s denotes power consumption by super node functions p s = 0.266W 1 million users –servers (~50% utilization) 2 kW P2P energy efficient when: # of super nodes (S) x power consump. of a super node (p s ) < C/S power consumption Estimating super node population –one per relayed call P2P may not be energy efficient than c/s for VoIP
29 Energy Efficiency of VoIP Systems End-points dominate energy consumption in c/s systems replacing PSTN –1 million users servers 2 kW, endpoints 5000 kW (at 5W) 0.04% (voice) NATs are responsible for energy inefficiency of c/s systems –problems will not go away with IPv6: firewalls VoIP and PSTN? –trying to figure it out
30 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems Energy Efficiency Measurement What are the measurement techniques to understand p2p comm. systems as a black box? Outline Reliability
31 Measurement: Skype Super node, ordinary node, login server Actively prevent against reverse engineering –LD_PRELOAD –forcing Skype to use a modified shared library Voice and video calls –relaying –over TCP Ports: no default listening port –opens port 80 (HTTP) and 443 (TLS) Contact list –stored centrally, initially distributed Video conferencing –using central servers INFOCOM’06
32 Is Skype free-riding on universities bandwidth? Two Skype clients in Columbia University forced to use a relay 6,000 relay calls Median latency: ~95ms 46% calls through relays with a.edu suffix 8% of calls through Columbia Skype users Is it deliberate? –probably not –relay selection biased towards high- capacity nodes which happen to be in universities GI’08 NAT our lab
33 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems Energy Efficiency Measurement Outline Reliability Other work
34 Other work Research TCP feasibility for real-time traffic (SIGMETRICS) Can software routers scale? (PRESTO) Hacking and building vazool.com
35 Directions for future research A holistic framework for reliability, performance, and energy tradeoffs in data centers –virtualization, consolidation –nano data centers? Preventing data lock-in for social networks and cloud-based services –enabling seamless data migration across different cloud providers –holy grail: ‘one click’ data migration
36 Background & Motivation Protocol and System Design Peer-to-Peer Communication Systems Energy Efficiency Measurement Conclusions Reliability Open P2PSIP protocol OpenVoIP 3 relays are sufficient to achieve 99.9% call reliability p2p may not be energy efficient than c/s endpoints dominate Skype is free-riding on universities bandwidth Other work