P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago.

Slides:



Advertisements
Similar presentations
PlanetLab Architecture Larry Peterson Princeton University.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
High Performance Computing Course Notes Grid Computing.
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
1 On Death, Taxes, & the Convergence of Peer-to-Peer & Grid Computing Adriana Iamnitchi Duke University “Our Constitution is in actual operation; everything.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago.
Peer-to-Peer Networking By: Peter Diggs Ken Arrant.
P2P Network is good or bad? Sang-Hyun Park. P2P Network is good or bad? - Definition of P2P - History of P2P - Economic Impact - Benefits of P2P - Legal.
Matei Ripeanu EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
New Challenges in Cloud Datacenter Monitoring and Management
Cloud computing Tahani aljehani.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Introduction Widespread unstructured P2P network
Gordon Kass CEO & President 919/ x26 Porivo Technologies Inc. Measuring end-to-end web performance.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
DISTRIBUTED COMPUTING
McGraw-Hill/Irwin © The McGraw-Hill Companies, All Rights Reserved BUSINESS PLUG-IN B17 Organizational Architecture Trends.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
CIS : Federated Distributed Systems Adriana Iamnitchi (Anda)
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Objectives Functionalities and services Architecture and software technologies Potential Applications –Link to research problems.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
The Anatomy of the Grid Mahdi Hamzeh Fall 2005 Class Presentation for the Parallel Processing Course. All figures and data are copyrights of their respective.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Copyright © 2002 Intel Corporation. Intel Labs Towards Balanced Computing Weaving Peer-to-Peer Technologies into the Fabric of Computing over the Net Presented.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 Testbeds Breakout Tom Anderson Jeff Chase Doug Comer Brett Fleisch Frans Kaashoek Jay Lepreau Hank Levy Larry Peterson Mothy Roscoe Mehul Shah Ion Stoica.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Enabling the Future Service-Oriented Internet (EFSOI 2008) Supporting end-to-end resource virtualization for Web 2.0 applications using Service Oriented.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Scalable Grid system– VDHA_Grid: an e-Science Grid with virtual and dynamic hierarchical architecture Huang Lican College of Computer.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Internet2 AdvCollab Apps 1 Access Grid Vision To create virtual spaces where distributed people can work together. Challenges:
7. Grid Computing Systems and Resource Management
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Bruce Hammer, Steve Wallis, Raymond Ho
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Chapter 1 Characterization of Distributed Systems
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Grid Computing.
CHAPTER 3 Architectures for Distributed Systems
Peer to Peer Networking and Application
Introduction to Cloud Computing
GGF15 – Grids and Network Virtualization
The Globus Toolkit™: Information Services
Microsoft Virtual Academy
Presentation transcript:

P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago

Why don’t we build a huge supercomputer? Top500 supercomputer list over time: Zipf distribution: Perf(rank) ≈ rank -k

Impact Trend: it is increasingly interesting to aggregate the capabilities of the machines in the tail of this distribution. A virtual machine that aggregates the last 10 in Top500 would rank 32 nd in ’95 but 14 th in ‘03 Both Grid and P2P computing are results of this trend: Grids: focus on assembling (a relatively small number of) resources to enable controlled, secure resource sharing P2P focus: scale, deployability. Challenge: design services that offer the best of both worlds complex, secure services, that deliver controlled QoS; are scalable and can be easily deployed.

Roadmap P2P Definitions Impact Applications Mechanisms PlanetLab Why is this interesting for you? Compare resource management in PlanetLab and Globus Wrap-Up

P2P Definition(s) Def 1: “A class of applications that take advantage of resources (e.g., storage, cycles, content) available at the edge of the Internet.” Edges often turned off, without permanent IP addresses, etc. Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between

P2P Impact (1) Widespread adoption leading to KaZaA – 170 millions downloads (3.5M/week) one of the most popular applications ever! (almost) zero-cost data distribution … is forcing companies to change their business models … might impact copyright laws FastTrack4,114,120 iMesh1,375,199 eDonkey569,097 DirectConnect136,552 Blubster97,128 Gnutella92,678 Cvernet91,750 Source: 2/4/2003

P2P Impact (2) Killer application for broadband to consumers P2P generated traffic may be the single largest contributor to Internet traffic today Internet2 Internet2 traffic statistics Source:

P2P Impact (3) A huge pool of underutilized resources lays around, users are willing to donate these resources $1.5M / year in additional power which can be put to work efficiently (at least for some types of applications) TotalLast 24 hours Users4,236,0902,365 Results received764M1.13M Total CPU Time1.3M years1.3K years Floating point operations51.4 TFLOPS Source: website, Oct. 2003

Roadmap P2P Definitions Impact Applications Mechanisms PlanetLab Why is this interesting for you? Compare resource management in PlanetLab and Globus Wrap-Up

Applications: Number crunching Examples: Entropia, UnitedDevices, DistributedScience, many others Approach suitable for a particular class of problems. Massive parallelism Low bandwidth/computation ratio Error tolerance Problems: Centralized. How to extend the model to problems that are not massively parallel? Relevant to Grid space: Valuable resources at the network edge Shows ability to operate in an environment with limited trust in participating resources

Applications: File sharing The ‘killer’ application to date Too many to list them all: Napster, FastTrack (KaZaA, iMesh), Gnutella (LimeWire, Morpheus, BearShare), Relevant to Grid space: Building a reliable, high-performance (?) data-delivery service using a large set of unreliable components.

Applications: Content Streaming Streaming: the user ‘plays’ the data as as it arrives source Oh, I am exhausted! Client/server approach P2P approach Possible solution: The first few users get the stream from the server New users get the stream from the server or from users who are already receiving the stream Relevant to Grid space: offload part of the server load to consumers to improve scalability

Applications: Performance benchmarking Problem: Evaluate the performance of your Web site form end-user perspective Multiple views on your site performance Generate Internet statistics Connectivity statistics Routing errors, routing maps Relevance to Grid space: Grid clients are heterogeneous, geographically dispersed How do I benchmark my services for this set of consumers?

End-to-end Performance Benchmarking Back-end Infrastructure Firewall Network Landscape Backbone ISP Consumer User 3 rd party content Web server App server Backbone Regional Network Enterprise Provider ISP Major Provider Local ISP T1 Corporate User Corporate Network Component Testing Datacenter Monitoring End-to-end Web Performance Testing Database Slide source: Last-mile “Blind Spot”

Many other P2P applications … Backup storage (HiveNet, OceanStore) Collaborative environments (Groove Networks) Instant messaging (Yahoo, AOL) Web serving communities (uServ) Spam filtering Anonymous Censorship-resistant publishing systems (Ethernity, Freenet)

Mechanisms To obtain a resilient system: integrate multiple components with uncorrelated failure curves, and use data and service replication. To improve delivered QoS: move service delivery closer to the user, and integrate multiple providers with uncorrelated demand curves (reduces over-provisioning for peak loads) To generate meaningful statistics, to detect anomalies: Provide views from multiple vantage points To provide anonymity: use large number of nodes, “hide in the crowd”, and make search costly (or impossible)

Roadmap P2P Definitions Impact Applications Mechanisms PlanetLab Why is this interesting for you? Compare resource management in PlanetLab and Globus Wrap-Up

PlanetLab Testbed to experiment with your P2P/Grid applications. >400 nodes, >150 sites, Funding: Intel, HP, Princeton University, View presented to users: a distributed set of VMs Allocation unit: a slice = a set of virtual machines (VM), one VM at each node. 452 nodes 162 sites 450 research projects OS Slice K OS Slice K OS Slice K OS Slice K

PlanetLab usage examples Stress-test Globus’ Replica Location Service GSLab: a playground for experimenting with grid-services – users have N machines already configured with Globus one mouse-click away ‘Better-than-Internet’ services: multipath TCP (mTCP) Resilient Overlays Multicast Overlays

Why should you find PlanetLab interesting? 1.Open, large-scale testbed to test your P2P applications or your Grid services 2.Solves a similar problem to Grids/Globus: building virtual organizations (or resource federations) Grids: testbeds (deployments of hardware and software) to solve computational problems. PlanetLab: testbed to play with CS applications Main problem for both: enable resource sharing across multiple administrative domains

But … Roadmap :  Attempt to trace differences back to starting assumptions on: user communities, applications, resources.  Look at mechanisms to build VO.

Assumptions: User communities PlanetLab: users are CS scientists that experiment with and deploy infrastructure services. Globus: users come from a more diverse pool of scientists that are interested to run efficiently their (end- user) applications. Implication: functionality offered OS User applications Globus PL Services PlanetLab.

Assumptions: Application characteristics Different view on geographical resource distribution: PlanetLab services: « distribution is a goal » leverage multiple vantage points for network measurements, exploit uncorrelated failure rates in large sets of components, Grid applications: « distribution is a fact of life » generally resource distribution is a fact of life: a result of how the VO was assembled (due to administrative constraints). Implication: mechanism design for resource allocation

Assumptions: Resources PlanetLab mission as testbed for a new class of networked services allows for little HW/SW heterogeneity. Globus is supported for large set architectures; for sites with multiple security requirements Implications: complexity, development speed

Assumptions: Resource ownership Goal: individual sites retain control over their resources PlanetLab limits the autonomy of individual sites in a number of ways: VO admins: Root access, Remote power button Sites: Limited choice of OS, security infrastructure Globus imposes fewer limits on site autonomy Requires fewer privileges (also can run in user space, ) PlanetLab emphasizes global coordination over local autonomy to a greater degree than Globus Implications: ease to manage and evolve the testbed PlanetLab Globus Individual site autonomy Control at VO level

Building Virtual Organizations Individual node/site functionality Mechanisms at the aggregate level Security infrastructure Delegation mechanisms Resource allocation and scheduling Resource discovery, monitoring, and selection.

Delegation mechanisms: Identity Delegation Globus Implementation based on delegated X.509 proxies PlanetLab None Broker/scheduler usage scenario: User A submits a job to S1 which, in turn, submits to S2. S2 needs to do accounting or make authorization decisions based on the identity that originated the job (A).

Delegation mechanisms: Delegating the Right to Resources PlanetLab Individual nodes managers hand out capabilities: akin to time- limited reservations Capabilities can be traded Extra layer to: provide secure transfer, prevent double spending, offer external representation Globus/GGF WS-Agreements protocols: To represent ‘contracts’ between providers and consumers. Local enforcement mechanism is not specified The two efforts are complementary! Broker/scheduler usage scenario: User A acquires capabilities from various brokers then submits the job.

Global resource allocation and scheduling PlanetLab Globus Users Application Managers Brokers / Agents Node Managers Nodes Resource usage delegation Sends capabilities (leases) Identity delegation Sends job descriptions

Wrap-up Scale & volatility Functionality & infrastructure Grids P2P Convergent Environment: Large, Dynamic, Self-Configuring Grids  Large scale: o Weak trust assumptions o Ease of integration  Intermittent resource participation  Little technical support  Diversity in shared resources  Build infrastructures to support diverse applications

What can Grids learn from P2P? Scalability Light-weight implementations Aggregate desktops and other small resources Intermittent operation, highly dynamic connectivity Autonomy

More information Links to papers, tech-reports, slides: Thank you.