CIS : Federated Distributed Systems Adriana Iamnitchi (Anda)
Contact Info Office: ENB 334 Office hours: Wednesdays, 10:45 – 1:00 and by appointment Course page:
Examples of Distributed Systems ATT webGnutella network The Internet A Sensor Network
Definition (a version) l A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable. –Entity=a process on a device (PC, PDA, mote) –Communication Medium=Wired or wireless network l "Federated" – spanning multiple institutional or network (DNS) domains
Outline l Case study: Seti, Napster, Gnutella l Administravia
6 CIS6930.5: Federated Distributed Systems (Fall 2005)
7 Operations data recorder screensavers WU storage splitters DLT tapes data server science DBuser DB result queue acct. queue garbage collector tape archive, delete tape backup master DB redundancy checking RFI elimination repeat detection web site CGI program web page generator
How does it work? l Fixed-rate data processing task l Low bandwidth/computation ratio l Independent parallelism l Error tolerance Master-worker architecture
History and Statistics l Conceived 1995, launched April 1999 l "scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI). You can participate by running a free program that downloads and analyzes radio telescope data. " l No ET signals yet, but other results TotalLast 24 Hours (as of Wed Feb 23 07:04:51) Users5,361,3134,391 Results received1,779 millions5 million Total CPU time2.2 million years years Average CPU time/work unit 10 hr 58 min 14.0 sec6 hr 19 min 30.1 sec
Public-resource computing l Utilizes idle computing cycles over Internet l Other systems: –Original: GIMPS, –Commercial: United Devices, Entropia, Porivo, Popular Power –Academic, open-source >Cosm,
None of the popularity of SETI! l ET l How to get and retain users (from David Anderson, the leader of the project) –Graphics are important (but monitors do burn in) –Teams: users recruit other users –Keep users informed l Science news l System management news l Periodic project s l Reward users: –PDF certificates –Milestone pages and s –Leader boards (overall, country, …)
Millions and millions of computers! (Problems) l Server scalability l Dealing with excess CPU time l Cheating l Bad behavior: –Team recruitment by spam –Sale of accounts on eBay l Malfunctions l Network bandwidth costs money
Summary l Master-worker design –Centralized solution >Master=central point of control >Single point of failure >Performance bottleneck l Incentives for participation –Mean sometimes incentives for cheating l Massive ("embarrassing") parallelism l Low bandwidth/computation ratio Users do donate real resources: $1.5M / year consumed power l More information:
Where is file A? The File Location Problem (Napster and Gnutella)
Napster: How It Works Client-server: Use central server to locate files Download files directly from peers
Napster users File list is uploaded 1.
Napster user Request and results User requests search at server. 2.
Napster user pings User pings hosts that apparently have data. Looks for best transfer rate. 3.
Napster user Retrieves file User retrieves file 4.
Napster: History l Program for sharing files over the Internet l History: –5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service –12/99: first lawsuit –3/00: 25% UWisc traffic Napster –2000: est. 60M users –2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws –7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus: 300K
Napster: Summary l Centralized server: –Client-server architecture –Single logical point of failure –Potential for congestion (bottleneck) –Napster "in control" (freedom is an illusion) l No security: –Passwords in plain text –No authentication –No anonymity
Outline l Public-resource computing –Case study: l Peer-to-peer systems –Case study 1: Napster –Case study 2: Gnutella l Discuss: –Characteristics –Impact –Architecture –Killer application
Gnutella: Search for Files with No Central Server
Where is file A? Ideas?
I have file A. Gnutella: Search Where is file A? Query Reply Flooding
Gnutella: History and Statistics l Gnutella history: –3/14/00: release by AOL, almost immediately withdrawn –too late: 1,859,340 users on Gnutella on August 25, 2am –many iterations to fix poor initial design l High impact: –Versions implemented –Different designs –Lots of research papers/ideas ( 06/24/'05) 251,137MP2P 294,255DirectConnect 1,146,880Overnet 1,516,762Gnutella 2,521,887FastTrack 4,123,688eDonkey2K NetworkUsers
What would you ask about Gnutella? l…l…l…l…
Gnutella: Heterogeneity All Peers Equal? (1) 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL
Gnutella: Free Riding All Peers Equal? (2) More than 25% of Gnutella clients share no files; 75% share 100 files or less Conclusion: Gnutella has a high percentage of free riders l If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. Adar and Huberman (Aug '00)
Flooding in Gnutella: Loop Prevention Seen request already
Gnutella Topology Mismatch
Gnutella Summary l Search by flooding l Self-configuring l Phenomena: –Not all peers equal –Free riding l Problems: –Topology mismatch –Duplicates due to flooding l Good source for technical info/open questions: –
Problems in Distributed Systems l … l Communication –Routing [IP,BGP] –Multicast [IP multicast, SRM, RMTP] l Post and retrieve [Usenet] l Search [Gnutella, Kazaa, etc., Google] l Storage [Databases] l Coordination l …
Challenges l … l Failures l Scale l Asynchrony l Security l Deployment l Adoption l …
Challenges (2) l … l Learn from usage –Example 1: The Internet –Example 2: Napster l Conflicting requirements: –Light but adaptable? –Light but data-consistent? (think transactions) –… (other examples?) l … (other examples?)
Course Organization/Syllabus/etc.
Administravia: Grading l Reviewing:30% l Discussion leading: 15% l Project: 55% –Aim high! –Have fun!
Administravia: Paper Reviewing (1) l Goals: –Think of what you read –Get used to writing paper reviews l Reviews due by midnight before class Follow the form when relevant. l State the main contribution of the paper l Critique the main contribution. –Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.
Administravia: Paper Reviewing (2) Rate how convincing the methodology is. l Do the claims and conclusions follow from the experiments? l Are the assumptions realistic? l Are the experiments well designed? l Are there different experiments that would be more convincing? l Are there other alternatives the authors should have considered? l (And, of course, is the paper free of methodological errors?)
Administravia: Paper Reviewing (3) l What is the most important limitation of the approach? l What are the three strongest and/or most interesting ideas in the paper? l What are the three most striking weaknesses in the paper? l Name three questions that you would like to ask the authors. l Detail an interesting extension to the work not mentioned in the future work section. l Optional comments on the paper that you'd like to see discussed in class.
Paper Reviewing (final) l Be professional in your writing l Have an eye on the writing style: –Clarity –Beware of traps: learn to use them in writing and detect them in reading –Detect (and stay away from) trivial claims. E.g., 1 st sentence in the Introduction: "The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…"
Administravia: Discussion leading l Come prepared! –Prepare discussion outline –Prepare questions: >"What if"s >Unclear things >… –Similar ideas in different contexts –Initiate short brainstorming sessions l Leaders do NOT need to submit paper reviews l Main goals: –Keep discussion flowing –Keep discussion relevant –Engage everybody (I'll have an eye on this, too)
Administravia: Projects l Combine with your research if relevant to the class l Get approval from all instructors if you overlap final projects: –Don't sell the same piece of work twice –You can get more than twice as many results with less than twice as much work l Aim high! –Put one extra month and get a publication out of it –It is doable l Try ideas that you postponed out of fear: it's just a class, not your PhD.
Administravia: Project deadlines (tentative) l Sept. 15: 1-page project proposal l Oct. 11: 3-page literature survey –Know relevant work in your problem area –If implementation project, list tools, similar projects l Nov. 11: 5-page Midterm project due –Have a clear image of what's possible/doable –Report preliminary results l Last class(es):In-class project presentation –Demo, if appropriate l Dec. 16: –10-page write-up
Next Class (Wed, August 31) l Read the 4 chapters from the Grid book l Send brief summaries (lists of ideas/problems discussed, etc) –Do not follow the reviewing form –Be brief and efficient! –Be BRIEF and EFFICIENT! l In-class discussion + some project ideas l Need discussion leader to team up with me for the class next week: –The structure of networks (pick 2): 1.Small-world file sharing communities, Iamnitchi, Ripeanu, Foster. Infocom On Power-Law Relationships of the Internet Topology, Faloutsos, Faloutsos, and Faloutsos, SIGCOMM Mapping the Gnutella network, M. Ripeanu et al, IEEE Computing Journal 2002.
Questions?