CS 34701: Large-Scale Networked Systems Professor: Ian Foster TA: Adriana Iamnitchi

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search in Power-Law Networks Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides also borrowed from the following paper Path Finding Strategies.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Mohamed Hafeeda, Ahsan Habib et al. Presented By: Abhishek Gupta.
Evaluation of Ad hoc Routing Protocols under a Peer-to-Peer Application Authors: Leonardo Barbosa Isabela Siqueira Antonio A. Loureiro Federal University.
Eddie Bortnikov/Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Copyright 2002 Ellis Horowitz A look at Peer-to-Peer File Sharing with Gnutella Prof. Ellis Horowitz November 25, 2002.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Object Naming & Content based Object Search 2/3/2003.
CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu.
EECE 571R (Spring 2010) Autonomic Computing (Building Self* Systems) Matei Ripeanu matei at ece.ubc.ca.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Welcome to CS 395/495 Measurement and Analysis of Online Social Networks.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Introduction of P2P systems
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
BitTorrent enabled Ad Hoc Group 1  Garvit Singh( )  Nitin Sharma( )  Aashna Goyal( )  Radhika Medury( )
Mapping the Gnutella Network Presented By: Tony Young M.Math Candidate October 7th, 2004.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
COS 420 Day 15. Agenda Finish Individualized Project Presentations on Thrusday Have Grading sheets to me by Friday Group Project Discussion Goals & Timelines.
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Netprog: Chat1 Chat Issues and Ideas for Service Design Refs: RFC 1459 (IRC)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Wireless Sensor Network Architectures
Early Measurements of a Cluster-based Architecture for P2P Systems
Peer-to-Peer Information Systems Week 6: Performance
Presentation transcript:

CS 34701: Large-Scale Networked Systems Professor: Ian Foster TA: Adriana Iamnitchi

CS Course Goals l Primary –Gain deep understanding of fundamental issues that effect design of large-scale networked systems –Map primary contemporary research themes –Gain experience in network research l Secondary –By studying a set of outstanding papers, build knowledge of how to present research –Learn how to read papers & evaluate ideas

How the Class Works l Research papers –Prior to each class, we all read and evaluate two research papers –During each class, we discuss those papers l Project –One-page project description by 2 nd week –Five-page project summary by 5 th week –10-20 final paper by 9 th week –Project presentations: 9 th and 10 th weeks.

Paper Review & Discussion l Everyone reads two papers per class and submits an evaluation (see below) l We discuss (not present) papers in class –A team of 2-3 leads each discussion –The leading team submits discussion plan before class, submits “master critique” and summarizes discussion at the beginning of following class l Look over schedule between now & Friday, when we will allocate discussants

Evaluations l You must submit evaluations of papers – them by 6pm the day before l Answer a set of standard questions 1.State the main contribution of the paper 2.Critique the main contribution 3.What are the three strongest and/or most interesting ideas in the paper? 4.Three most striking weaknesses in the paper? 5.Three questions to ask the authors? 6.Detail an interesting extension to the work not mentioned in the future work section. 7.Optional comments on the paper that you’d like to see discussed in class.

What I’ll Assume You Know l Basic Internet architecture –IP, TCP, DNS, HTTP l Basic principles of distributed computing –Asynchrony (cannot distinguish between communication failures and latency) –Partial global state knowledge (cannot know everything correctly) –Failures happen. In very large systems, even rare failures happen often l If there are things that don’t make sense, ask!

Large-Scale Networked Systems l Internet-connected networks with a large number of components, spanning multiple DNS domains (usually WAN) l Designed to solve specific problems: –Content distribution –Cycle sharing –File sharing –Sensor data fusion –Distributed data analysis –…

Example: Gnutella l Peer-to-peer file sharing system –File sharing: goal is to enable publication and access to files –P2P: no central servers; all clients also act as servers and are equivalent (more or less) l Issues –Scaling to very large numbers of nodes –Properties: bootstrapping, reliability, cost, anonymity, security, freeloading, …

Gnutella Protocol Overview l P2P file sharing application on top of an overlay network: –Nodes maintain open TCP connections. –Messages are broadcasted (flooded) or back-propagated. l Protocol: Broadcast (Flooding) Back- propagated Node to node MembershipPINGPONG QueryQUERYQUERY HIT File downloadGET, PUSH

Gnutella search mechanism A Steps: 1.Node 2 initiates search for file A

Gnutella search mechanism A Steps: 1.Node 2 initiates search for file A 2.Sends message to all neighbors A A

Gnutella search mechanism A Steps: 1.Node 2 initiates search for file A 2.Sends message to all neighbors 3.Neighbors forward message A A A

Gnutella search mechanism Steps: 1.Node 2 initiates search for A 2.Sends message to all neighbors 3.Neighbors forward message 4.Nodes that have file A initiate a reply message A:5 A A:7 A A

Gnutella search mechanism Steps: 1.Node 2 initiates search for A 2.Sends message to all neighbors 3.Neighbors forward message 4.Nodes that have file A initiate a reply message 5.Query reply message is back- propagated A:5 A:7 A A

Gnutella search mechanism Steps: 1.Node 2 initiates search for A 2.Sends message to all neighbors 3.Neighbors forward message 4.Nodes that have file A initiate a reply message 5.Query reply message is back- propagated 6.Node 2 gets replies A:5 A:7

Gnutella search mechanism Steps: 1.Node 2 initiates search for A 2.Sends message to all neighbors 3.Neighbors forward message 4.Nodes that have file A initiate a reply message 5.Query reply message is back- propagated 6.Node 2 gets replies 7.File download download A

Tools for network exploration  Eavesdropper - modified node inserted into the network to log traffic.  Crawler - connects to all active nodes and uses the membership protocol to discover graph topology.  Parallel crawling.  Graph analysis tools  high-volume offline computations.

Network growth  High user interest:  Users tolerate high latency, low quality results.  Better resources:  DSL and cable modem nodes grew from 24% to 41% over 6 months.  Open architecture / open-source environment:  Competing implementations,  Lower overhead network traffic, improved resource utilization, better structure,  Recently, two-level structure.

Growth invariants 1.Graph connectivity: 3.4 links per node on average. 2.Path length distribution: node-to-node distance maintains similar distributions.  Avg. node-to-node distance grew 25% while the network grew 50 times over 6 months.  Random graph theory predicts about 75% increase.

Is Gnutella a power-law network? November 2000 Power-law networks: the number of nodes N with exactly L links is proportional to L -k N ~ L -k Examples:  The Internet,  In/out links to/from HTML pages,  Citations network,  US power grid,  Social networks. Implication: High tolerance to random node failure but low reliability when facing an ‘intelligent’ adversary

Is Gnutella a power-law network?  Later, larger networks display a bimodal distribution.  Implications:  High tolerance to random node failures preserved  Increased reliability when facing an attack. May 2001

Traffic analysis   6-8 kbps per link over any connection.  Traffic structure changed over time.

Total generated traffic 1Gbps (or 330TB/month)! –Note that this estimate excludes actual file transfers –Q: Does it matter? –Compare to 15,000TB/month estimated in US Internet backbone (Dec. 2000). Reasoning :  QUERY and PING messages are flooded. They form more than 90% of generated traffic  predominant TTL=7  >95% of nodes are less than 7 hops away  measured traffic at each link about 6 to 8kbs  network with 50k nodes and 170k links

Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology!  40% of all nodes are in the 10 largest Autonomous Systems (AS).  Only 2-4% of all TCP connections link nodes within the same AS.  Largely ‘random wiring’.  Entropy experiment gives similar results.

Course Topics l Internet Architecture and Design Principles l Flat Pricing vs. Prioritized Traffic l Internet Measurements l Availability in Wide-Area l Patterns in Real Networks l Modeling the Internet Topology l Internet Services: DNS l Web Caching, Content Distribution Networks l Overlay Networks l Peer-to-Peer systems l Computational Grids l Security Issues l Sensor Nets l Wireless Networks l XML SOAP and Web Services

Course Topics l Internet Design Principles –How do I deliver Internet services: end-to-end vs. within the network? l Flat Pricing vs. Prioritized Traffic –How do I determine which traffic to pass over the Internet? l Internet Measurements –What does the Internet really look like?

Course Topics l Availability in Wide-Area –How reliable is the Internet? l Patterns in Real Networks –What does Internet traffic look like? l Modeling the Internet Topology –How can I construct realistic models of Internet structure?

Course Topics l Internet Services: DNS –How well does DNS work? l Web Caching, Content Distribution Networks –How do we optimize Web content mgmt? l Overlay Networks –Improving routing performance

Course Topics l Peer-to-Peer systems –Gnutella, etc., etc. l Computational Grids –Globus, etc. l Security Issues –Authorization, etc.

Course Topics l Sensor Nets –How do I structure & program networks of lightweight devices? l Wireless Networks –How do I route in ad hoc networks? l XML SOAP and Web Services –What are Web services anyway? Disaster Response Circulatory Net

Projects l Literature surveys, real implementations, analytical evaluations l Can be performed individually or in a team of two l Your project ideas appreciated (to be discussed before proposal due date) l Primary goal is to do something interesting and to do it well

Example Project l Gnutella network analysis –Develop a “crawler” that traverses network, collects membership & connectivity info –Analyze structure –Characterize structure l See, e.g.: –Mapping the Gnutella Network: Properties of Large- Scale Peer-to-Peer Systems and Implications for System Design, M. Ripeanu, I. Foster, A. Iamnitchi, in IEEE Internet Computing Journal, vol. 6(1), 2002

Project Ideas l /cs347_projects.htm l Gnutella network measurements –Topology discovery for 500K nodes –Structural analysis with 500K nodes –Study impact of overlay networks –Etc.

Project Ideas l Overlay networks: build unstructured or semistructured self-organizing overlays optimizing different cost functions: –Topology-aware: map onto physical infrastructure –Usage-aware: map onto usage patterns l Analysis of Sloan Digital Sky Survey logs to explore access patterns –What files are accessed how often –What community usage patterns emerge? –How can we exploit these in content distribution networks?

Project Ideas l Compare qualitatively and analytically current file- location solutions (CAN, Chord, Gnutella, Napster, etc.) in the context of scientific file-sharing collaborations. –Evaluate sharing patterns based on real usage traces in a scientific collaboration –Use these patterns to evaluate benefits/drawbacks and propose better alternatives l Expand existing simulator to evaluate request forwarding techniques for resource location in grid environments

For More Information l Contact me –Ian Foster, – or set up a meeting l Contact Anda, our TA –Adriana Iamnitchi, l Monitor the class web page –

Next 2 Classes l Friday: –Discuss: >J. Saltzer, D. Reed, and D. Clark, End-to-end Arguments in System Design. ACM Transactions on Computer Systems, Vol. 2, No. 4, pp , 1984.End-to-end Arguments in System Design >D. Clark and M. Blumenthal, Rethinking the design of the Internet: The end to end arguments vs. the brave new world, Workshop on Policy Implications of End-to-End. December 1, 2001.Rethinking the design of the Internet: The end to end arguments vs. the brave new world –Leading group: Ian + 2 volunteers (who?) l Wednesday: –Leading Group: Anda volunteers (who?)