Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Dynamic Task Assignment Load Index for Geographically Distributed Web Services PhD Research Proposal By: Dhiah Al-Shammary Supervised.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Impact Analysis of Cheating in Application Level Multicast s 1090176 Masayuki Higuchi.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Predicting Tor Path Compromise by Exit Port IEEE WIDA 2009December 16, 2009 Kevin Bauer, Dirk Grunwald, and Douglas Sicker University of Colorado Client.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
A Mobile Infrastructure Based VANET Routing Protocol in the Urban Environment School of Electronics Engineering and Computer Science, PKU, Beijing, China.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
Computer Science 1 ShapeShifter: Scalable, Adaptive End-System Multicast John Byers, Jeffrey Considine, Nicholas Eskelinen, Stanislav Rost, Dmitriy Zavin.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.
Where to go from here? Get real experience building systems! Opportunities: 496 projects –More projects:
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
An Active Reliable Multicast Framework for the Grids M. Maimour & C. Pham ICCS 2002, Amsterdam Network Support and Services for Computational Grids Sunday,
Locality-Aware Content Distribution Danny Bickson, Dahlia Malkhi, David Rabinowitz.
University of Nevada, Reno Virtual Direction Multicast for Overlay Networks Suat Mercan & Dr. Murat Yuksel HOTP2P’11.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Redundant Array of Independent Disks
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
NORDUnet NORDUnet The Fibre Generation Lars Fischer CTO NORDUnet.
Emalayan Vairavanathan
DISTRIBUTED COMPUTING
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Higashino Lab. Maximizing User Gain in Multi-flow Multicast Streaming on Overlay Networks Y.Nakamura, H.Yamaguchi and T.Higashino Graduate School of Information.
Aadil Zia Khan and Shahab Baqai LUMS School of Science and Engineering QoS Aware Path Selection in Content Centric Networks Fahad R. Dogar Carnegie Mellon.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Armin Bahramshahry August  Background  Problem  Solution  Evaluation  Summary.
A Prediction-based Fair Replication Algorithm in Structured P2P Systems Xianshu Zhu, Dafang Zhang, Wenjia Li, Kun Huang Presented by: Xianshu Zhu College.
Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Impact of Topology on Overlay Multicast Suat Mercan.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Running large scale experimentation on Content-Centric Networking via the Grid’5000 platform Massimo GALLO (Bell Labs, Alcatel - Lucent) Joint work with:
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.
Multicast instant channel change in IPTV systems 1.
1 Optical Packet Switching Techniques Walter Picco MS Thesis Defense December 2001 Fabio Neri, Marco Ajmone Marsan Telecommunication Networks Group
On the Topology of Wireless Sensor Networks Sen Yang, Xinbing Wang, Luoyi Fu Department of Electronic Engineering, Shanghai Jiao Tong University, China.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Research Direction Advisor: Frank,Yeong-Sung Lin Presented by Jia-Ling Pan 2010/10/211NTUIM OPLAB.
Global Internet 2005 A Comparative Study of Multicast Protocols: Top, Bottom, or In the Middle? Li Lao (UCLA), Jun-Hong Cui (UCONN) Mario Gerla (UCLA),
A Grid-enabled Multi-server Network Game Architecture Tianqi Wang, Cho-Li Wang, Francis C.M.Lau Department of Computer Science and Information Systems.
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
Bruce Hammer, Steve Wallis, Raymond Ho
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Multicast with Network Coding in Application-Layer Overlay Networks Y. Zhu, B. Li, and J. Guo University of Toronto Present by Cheng Huang
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
INTERNET SIMULATOR Jelena Mirkovic USC Information Sciences Institute
the project of the voluntary distributed computing ver.4.06 Ilya Kurochkin Institute for information transmission problem Russian academy of.
Architecture and Algorithms for an IEEE 802
SuperB and its computing requirements
Summary Background Introduction in algorithms and applications
Dynamic Replica Placement for Scalable Content Delivery
Presentation transcript:

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany

2 Introduction  Data-intensive science: large-scale simulations and new scientific instruments generate huge volumes of data (PetaBytes).  User communities: large, geographically dispersed Requirement : Efficient data dissemination tools Samer Al-Kiswany /26

3 Introduction - Example Samer Al-Kiswany /26

4 Question ? What data dissemination strategies perform best in today's Grids deployments? Grido Data dissemination solutions: IP-Multicast, Bullet, BitTorrent, SPIDER, OMNI, ALMI, Logistical-Multicast, Narada, Scribe, Grido, FastReplica … and many others. Samer Al-Kiswany /26

5 Workload characteristics Deployment platform characteristics Data dissemination proposed solutions Evaluation Recommendations What data dissemination strategies perform best in today's Grids deployments? Roadmap Samer Al-Kiswany /26

6 Data-intensive scientific collaboration characteristics:  Scale of data: massive data collections (TeraBytes)  Data usage: Uniform popularity distributions, and co ‑ usage  Near real time processing. Workload and Deployment Platform  Resource availability: low churn rate, high node availability, well-provisioned networks.  Collaborative environments: no freeriding, thus less effort is needed to control fair resource sharing. Deployment platform characteristics: Samer Al-Kiswany /26

7 Workload characteristics Deployment platform characteristics Data dissemination proposed solutions Evaluation Recommendations What data dissemination strategies perform best in today's Grids deployments? Roadmap Samer Al-Kiswany /26

8 Classification of Approaches TechniqueProtocol Tree based techniquesALM and SPIDER SwarmingBullet and BitTorrent Techniques employing intermediate storage capabilities Logistical Multicasting Base Cases: IP-Multicast. Parallel transfers: separate data channels from the source to each destination. Samer Al-Kiswany /26

9 Separate Transfer from the Source to every Destination /26 Drawbacks: Overwhelms the source – does not scale Generates high duplicate traffic at the links around the source Does not exploit all available transport capacity.

10 IP Multicasting /

11 IP Multicast /26 Drawbacks: Limited deployment Vulnerability to nodes failures Does not exploit all available transport capacity. Throughput limited by bottleneck link 10 5

12 Tree Based Techniques: Application Level Multicast (ALM) Source ALM Tree /26

13 Tree Based Techniques: Application Level Multicast (ALM) /26 Source ALM Tree Drawbacks: Vulnerability to nodes failures Does not exploit all possible routes in the network.

14 Swarming Techniques: BitTorrent and Bullet 1234 Complete file /26 4

15 4 Swarming Techniques: BitTorrent and Bullet 1234 Complete file /

16 Swarming Techniques: BitTorrent and Bullet / Complete file Drawbacks: Generates high duplicate traffic.

17 Logistical Multicasting /26

18 Roadmap Question: What data dissemination strategies perform best in today's Grids deployments? Evaluation Workload characteristics Deployment platform characteristics Data dissemination proposed solutions Recommendations  Analytical Modeling  Deployment based  Simulation Evaluation Approaches: Samer Al-Kiswany /26

19 Samer Al-Kiswany Methodology Simulator Design: Block-level simulation. Simulates physical layer link-contention /26 Inputs: -Real topologies of three deployed Grid testbeds: LCG, GridPP, EGEE. -Generated topologies:  100 (using BRITE)

20 Methodology Success criteriaMetrics Dissemination timeTransfer time. OverheadMB x hop Load balancingVolume of in/out data. FairnessLink stress Samer Al-Kiswany /26

21 Transfer Time Number of destinations that have completed the file transfer for the original EGEE topology. Samer Al-Kiswany /26

22 Transfer Time – With reduced core-link bandwidth Number of destinations that have completed the file transfer – EGEE topology with core bandwidth reduced to 1 / 8 of the original one. Conclusions : On well-provisioned topologies even naïve algorithms perform well. On constrained topologies application ‑ level techniques perform uniformly well: are among the first to finish the transfer with good intermediate progress. Samer Al-Kiswany /26

23 Summary Motivating question: What data dissemination strategies perform best in today's Grids deployments? In this project, we:  Simulated representative solutions.  Considering the characteristics of the workload and deployed platforms  Our results provide guidelines for selecting the data dissemination technique, depending on the:  Target environment.  Overall system workload characteristics.  Success Criteria. Samer Al-Kiswany /26

24 Research Publications Samer Al-Kiswany /26 This work resulted in two refereed publications, and one journal submission:  Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations, S. Al-Kiswany, M. Ripeanu, A. Iamnitchi, and S. Vazhkudai, Submitted to the Journal of Grid Computing.  Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?, S. Al-Kiswany, M. Ripeanu, A. Iamnitchi, and S. Vazhkudai, EuroPar, 2007, France.( acceptance rate = 26%)  A Simulation Study of Data Distribution Strategies for Large-scale Scientific Data Collaborations, S. Al-Kiswany and M. Ripeanu, IEEE CCECE 2007.

25 Other Research Work I am involved in another two research projects: Scavenged Storage System  stdchk: A Checkpoint Storage System for Desktop Grid Computing  A High-Performance GridFTP Server at Desktop Cost StoreGPU Exploiting the GPU for computationally intensive storage system operations. Samer Al-Kiswany /26

26 Thank you