Efficient P2P backup through buffering at the edge S. Defrance, A.-M. Kermarrec (INRIA), E. Le Merrer, N. Le Scouarnec, G. Straub, A. van Kempen
Peer to Peer backup system 2 10/24/2015 « Pure » P2P backup systems severely limited by: Low availability Asymmetric bandwidth (Low uplink speed) Asynchrony Exploit users’ ressources : each user provides storage space Time To Backup (TTB) and Time to restore (TTR) data may be very high Practical deployment is limited Peer 2 0 h 12 h24 h Peer 1
CDN-assisted architecture 3 10/24/2015 The performances of client-server systems are approached (in terms of Time To Backup and Time To Restore data) However : A centralized part remains Not fully convenient for users Server = Reliable component Architecture proposed in P2P 2010 :
What we propose 4 10/24/2015 Gateways are turned into stable buffering layers To take into account the low-level structure of network (i.e the presence of gateways in home networks) To use gateways to distribute the centralized part of the hybrid scheme Home network (LAN) LAN Mask the asynchrony between peers
Why gateways are good candidates ? 5 10/24/2015 Already present in users 'homes Storage capable (for buffering) Highly available At the frontier between a fast LAN and a slow WAN Home network
Gateways are highly available 6 10/24/2015 We periodically pinged a random set of static IP of a french ISP* 25,000 gateways For 7.5 months *The trace is available at : Average gateway availability : 86 % Large part is very stable A few have power-off habits (daily or holiday basis)
How does it work ? 7 10/24/2015 Prepare (LAN speed) Backup (WAN speed) Offload (LAN speed)
How do we evaluate ? 8 10/24/2015 Trace-based simulation using public traces To model peers behavior : -Skype 28 Days 1269 Peers Availability Mean = 0.5 -Jabber 28 Days 465 Peers Availability Mean = 0.27 Scenario: Size of archive : 1GB Data creation : Poisson process (3 backups/month/user avg) Erasure code 50 simulations/curve To model gateways behavior : our gateway trace To model bandwidth uplink : trace from a study of residential broadband networks Uplink Mean = 66 kB/s We randomly assign one gateway and one uplink speed to one peer of each trace
What do we evaluate ? 9 10/24/2015 CDN-Assisted (CDNA) Pure P2P (P2P) Gateway-Assisted (GWA) We compare : We evaluate : Time To Backup (Hours) Time To Restore (Hours) Mean and Max data buffered (Mbytes) TTB : Time between the backup request and the time when the last block has been completely uploaded TTR : Time between the restore request and the time we downloaded enough data to reconstruct the file
Time To Backup (Stored safely at remote place) TTB & TTR (Skype trace) 90th Percentile of completed backup GWACDNAP2P 30 H60 H140 H 90th Percentile of completed restore GWACDNAP2P 3 H40 H Time To Restore (Retrieve an archive locally) 10/24/
Scaling (Skype trace) 11 10/24/2015 Better scaling with archive size : This enables users to backup larger amounts of data
Low storage needs 1GB archives: 2.5GB needed (99%) Realistic for current gateways Dimensioning (Skype trace) Stopping backups 10/24/ Average storage on gateways (MB) Average usage remains low Less than 1MB here Data is really offloaded to peers Gateway effectively used as buffers
Conclusion 13 10/24/2015 Realistic architecture for P2P backup systems Evaluation using trace-based simulation TTB and TTR are greatly reduced (Network connection can be used more efficiently) More convenient for users : Let to offload backup tasks quickly (LAN speed) from the user’s machine to the gateway Fully decentralized Trace of gateway availability
14 10/24/2015 Thank you !