Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION? Mayur Palankar and Adriana Iamnitchi University of South Florida Matei Ripeanu University of British Columbia.
S4: A Simple Storage Service for Sciences Matei Ripeanu Adriana Iamnitchi University of British Columbia University of South Florida.
Impact Analysis of Cheating in Application Level Multicast s 1090176 Masayuki Higuchi.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
A Mobile Infrastructure Based VANET Routing Protocol in the Urban Environment School of Electronics Engineering and Computer Science, PKU, Beijing, China.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
1 An Overlay Scheme for Streaming Media Distribution Using Minimum Spanning Tree Properties Journal of Internet Technology Volume 5(2004) No.4 Reporter.
Efficient Large Scale Content Distribution WDAS 2004 By Danny Bickson, Dahlia Malkhi, David Rabinowitz.
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
Swarming on Optimized Graphs for n-way Broadcast Georgios Smaragdakis joint work with Nikolaos Laoutaris, Pietro Michiardi, Azer Bestavros, John Byers,
An Evaluation of Scalable Application-level Multicast Using Peer-to-peer Overlays Miguel Castro, Michael B. Jones, Anne-Marie Kermarrec, Antony Rowstron,
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
An Active Reliable Multicast Framework for the Grids M. Maimour & C. Pham ICCS 2002, Amsterdam Network Support and Services for Computational Grids Sunday,
Locality-Aware Content Distribution Danny Bickson, Dahlia Malkhi, David Rabinowitz.
Modeling and Evaluation of Fibre Channel Storage Area Networks Xavier Molero, Federico Silla, Vicente Santonia and Jose Duato.
University of Nevada, Reno Virtual Direction Multicast for Overlay Networks Suat Mercan & Dr. Murat Yuksel HOTP2P’11.
Network Coding vs. Erasure Coding: Reliable Multicast in MANETs Atsushi Fujimura*, Soon Y. Oh, and Mario Gerla *NEC Corporation University of California,
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
P2P Architecture Case Study: Gnutella Network
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Higashino Lab. Maximizing User Gain in Multi-flow Multicast Streaming on Overlay Networks Y.Nakamura, H.Yamaguchi and T.Higashino Graduate School of Information.
Aadil Zia Khan and Shahab Baqai LUMS School of Science and Engineering QoS Aware Path Selection in Content Centric Networks Fahad R. Dogar Carnegie Mellon.
1 Heterogeneity in Multi-Hop Wireless Networks Nitin H. Vaidya University of Illinois at Urbana-Champaign © 2003 Vaidya.
A Prediction-based Fair Replication Algorithm in Structured P2P Systems Xianshu Zhu, Dafang Zhang, Wenjia Li, Kun Huang Presented by: Xianshu Zhu College.
Wireless Mesh Network 指導教授:吳和庭教授、柯開維教授 報告:江昀庭 Source reference: Akyildiz, I.F. and Xudong Wang “A survey on wireless mesh networks” IEEE Communications.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
EGEE is a project funded by the European Union under contract IST Bandwidth Measurements Loukik Kudarimoti Network Engineer, DANTE JRA4 Meeting,
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Impact of Topology on Overlay Multicast Suat Mercan.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Running large scale experimentation on Content-Centric Networking via the Grid’5000 platform Massimo GALLO (Bell Labs, Alcatel - Lucent) Joint work with:
Hybrid Cellular-Ad hoc Data Network Shuai Zhang, Ziwen Zhang, Jikai Yin.
Othman Othman M.M., Koji Okamura Kyushu University 1.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Multicast instant channel change in IPTV systems 1.
1 Optical Packet Switching Techniques Walter Picco MS Thesis Defense December 2001 Fabio Neri, Marco Ajmone Marsan Telecommunication Networks Group
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Global Internet 2005 A Comparative Study of Multicast Protocols: Top, Bottom, or In the Middle? Li Lao (UCLA), Jun-Hong Cui (UCONN) Mario Gerla (UCLA),
A Grid-enabled Multi-server Network Game Architecture Tianqi Wang, Cho-Li Wang, Francis C.M.Lau Department of Computer Science and Information Systems.
COGNITIVE NETWORK ACCESS USING FUZZY DECISION MAKING Nicola Baldo and Michele Zorzi Department of Information Engineering – University of Padova, Italy.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
File Grouping for Scientific Data Management: Lessons from Experimenting with Real Traces Shyamala Doraimani* and Adriana Iamnitchi University of South.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Efficient Geographic Routing in Multihop Wireless Networks Seungjoon Lee*, Bobby Bhattacharjee*, and Suman Banerjee** *Department of Computer Science University.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Multicast with Network Coding in Application-Layer Overlay Networks Y. Zhu, B. Li, and J. Guo University of Toronto Present by Cheng Huang
Courtesy Piggybacking: Supporting Differentiated Services in Multihop Mobile Ad Hoc Networks Wei LiuXiang Chen Yuguang Fang WING Dept. of ECE University.
the project of the voluntary distributed computing ver.4.06 Ilya Kurochkin Institute for information transmission problem Russian academy of.
Architecture and Algorithms for an IEEE 802
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
Algorithms for Big Data Delivery over the Internet of Things
Matei Ripeanu The University of Chicago
Plethora: Infrastructure and System Design
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Presentation transcript:

Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint work with Matei Ripeanu – University of British Columbia Adriana Iamnitchi - University of South Florida Sudharshan Vazhkudai - Oak Ridge National Laboratory

2 Introduction  Data-intensive science: large-scale simulations and new scientific instruments generate huge volumes of data (PetaBytes).  User communities: large, geographically dispersed Requirement : Efficient data dissemination tools Samer Al-KiswanyEuroPar ‘07 /26

3 Introduction - Example Samer Al-KiswanyEuroPar ‘07 /26

4 Question ? What data dissemination strategies perform best in today's Grids deployments? Samer Al-KiswanyEuroPar ‘07 /26 Grido Data dissemination solutions: IP-Multicast, Bullet, BitTorrent, SPIDER, OMNI, ALMI, Logistical-Multicast, Narada, Scribe, Grido, FastReplica … and many others.

5 Workload characteristics Deployment platform characteristics Data dissemination proposed solutions Evaluation Recommendations What data dissemination strategies perform best in today's Grids deployments? Roadmap Samer Al-KiswanyEuroPar ‘07 /26

6 Samer Al-KiswanyEuroPar ‘07 /26 Data-intensive scientific collaboration characteristics:  Scale of data: massive data collections (TeraBytes)  Data usage: Uniform popularity distributions, and co ‑ usage Workload and Deployment Platform  Resource availability: low churn rate, high node availability, well-provisioned networks.  Collaborative environments: no freeriding, thus less effort is needed to control fair resource sharing Deployment platform characteristics:

7 Workload characteristics Deployment platform characteristics Data dissemination proposed solutions Evaluation Recommendations What data dissemination strategies perform best in today's Grids deployments? Roadmap Samer Al-KiswanyEuroPar ‘07 /26

8 Classification of Approaches TechniqueProtocol Tree based techniquesALM and SPIDER SwarmingBullet and BitTorrent Techniques employing intermediate storage capabilities Logistical Multicasting Samer Al-KiswanyEuroPar ‘07 /26 Base Cases: IP-Multicast. Parallel transfers: separate data channels from the source to each destination.

9 Separate Transfer from the Source to every Destination /26 Drawbacks: Overwhelms the source – does not scale Generates high duplicate traffic at the links around the source Does not exploit all available transport capacity.

10 IP Multicasting /

11 IP Multicast /26 Drawbacks: Limited deployment Vulnerability to nodes failures Does not exploit all available transport capacity. Throughput limited by bottleneck link 10 5

12 Tree Based Techniques: Application Level Multicast (ALM) Source ALM Tree /26

13 Tree Based Techniques: Application Level Multicast (ALM) /26 Source ALM Tree Drawbacks: Vulnerability to nodes failures Does not exploit all possible routes in the network.

14 Swarming Techniques: BitTorrent and Bullet 1234 Complete file /26 4

15 4 Swarming Techniques: BitTorrent and Bullet 1234 Complete file /

16 Swarming Techniques: BitTorrent and Bullet / Complete file Drawbacks: Generates high duplicate traffic.

17 Logistical Multicasting /26

18 Roadmap Question: What data dissemination strategies perform best in today's Grids deployments? Evaluation Workload characteristics Deployment platform characteristics Data dissemination proposed solutions Recommendations Samer Al-KiswanyEuroPar ‘07 /26  Analytical Modeling  Implementation  Simulation Evaluation Approaches:

19 Samer Al-Kiswany Methodology Simulator Design: Block-level simulation. Simulates physical layer link-contention EuroPar ‘07 /26 Inputs: -Real topologies of three deployed Grid testbeds: LCG, GridPP, EGEE. -Generated topologies:  100 (using BRITE)

20 Samer Al-Kiswany Methodology EuroPar ‘07 /26 Success criteriaMetrics Dissemination timeTransfer time. OverheadMB x hop Load balancingVolume of in/out data. FairnessLink stress

21 Transfer Time Number of destinations that have completed the file transfer for the original EGEE topology. Samer Al-KiswanyEuroPar ‘07 /26

22 Transfer Time – With reduced core-link bandwidth Number of destinations that have completed the file transfer – EGEE topology with core bandwidth reduced to 1 / 8 of the original one. Conclusions : On well-provisioned topologies even naïve algorithms perform well. On constrained topologies application ‑ level techniques perform uniformly well: are among the first to finish the transfer with good intermediate progress, Samer Al-KiswanyEuroPar ‘07 /26

23 Protocol Overhead – Metric Definition Samer Al-KiswanyEuroPar ‘07 / Useful Duplicate Useful

24 Protocol Overhead Overhead of each protocol on EGEE Topology. Conclusion: Application-level techniques generates significant overheads. Up to 4 times more than IP layer solutions. Reasons: Samer Al-KiswanyEuroPar ‘07 /26  The dissemination decisions is based on application level metrics.  Ignore node topology location.

25 Fairness Link stress distribution for the EGEE topology. For BitTorrent and Bullet the plot presents maximum link stress. Conclusion: Application ‑ level solutions have a considerable impact on competing traffic. Samer Al-KiswanyEuroPar ‘07 /26

26 Summary Samer Al-KiswanyEuroPar ‘07 /26 Motivating question: What data dissemination strategies perform best in today's Grids deployments? In this project, we:  Simulated representative solutions.  Considering the characteristics of the workload and deployed platforms  Our results provide guidelines for selecting the data dissemination technique, depending on the:  Target environment.  Overall system workload characteristics.  Success Criteria.

27 Thank you