Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002.

Similar presentations


Presentation on theme: "1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002."— Presentation transcript:

1 1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002

2 2 What is P2P? napster gnutella maorpheus kazaa bearshare seti@home folding@home ebay limewire icq fiorana mojo nation jxta united devices open cola uddi process tree can chord ocean store farsite pastry tapestry ? grove netmeeting freenet popular power aim jabber

3 3 Napster central index join query answer get file...

4 4 Gnutella query

5 5 Morpheus... super peer

6 6 Seti@Home satellite dish... raw data chunk analyzed data central site

7 7 Lockss library A library B library C library E library D D1 D2 D3

8 8 PeerCast Stanford source Stanford source after: before:

9 9 What is a P2P System? uMultiple sites (at edge) uDistributed resources uSites are autonomous (different owners) uSites are both clients and servers uSites have equal functionality P2P Purity

10 10 P2P is BAD IDEA!! uDistribution is expensive! uSpecialized functionality is good!

11 11 Example: Distributed Data Management uDistribution is expensive uIf you must distribute:  build centralized directory, index use backups for reliability  for replicated data, use primary copy use backups for reliability

12 12 Computational Efficiency is NOT Main Goal uMain driving force in a P2P system:  exploiting existing (often free) resources  sharing costs among many  legal protection  autonomy  anonymity

13 13 Should We Do P2P Research? uShould we help people break the law? uAnalogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes,... ??

14 14 Should We Do P2P Research? uYES: P2P not exclusively for breaking law  Remember the VCR uYES: P2P can liberate us from culture “plantation owners” (Lessig)

15 15 Is “Free Culture’’ Feasible? uExample: Legal texts uCan we afford it? economic activity rules of the game today

16 16 Should DB community work on P2P? uYES

17 17 P2P Challenges uEasier to list NON-Research-Topics:  Color schemes for P2P Nodes  Impact of P2P on Moroccan 15th Century Literature

18 18 P2P Challenges uSearch uResource Management uSecurity & Privacy

19 19 Search Taxonomy lookup content queries search single site regional global scope of index freenet gnutella napster morpheus can routing replicated SP partial

20 20 Index Implementation Taxonomy yes no centralized distributed P2P nature of index freenet gnutella napster morpheus can routing index location correlated with content location replicated SP partial

21 21 Content Addressable Network (CAN) 1 2 Nodes Data

22 22 Can We Improve Flooding? yes no centralized distributed P2P nature of index freenet gnutella napster morpheus can routing index location correlated with content location replicated SP partial

23 23 Directed BFS in Gnutella uHeuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree query ?...

24 24 How Does One Evaluate? uLive Gnutella? uUse real Gnutella as “laboratory”

25 25 Time to Satisfaction for Directed BFS

26 26 Routing Index AB C D 5025C AIDB 015D AIDB 050B AIDB 015D 200A 5025C 2065B 7075B 5090B 200A AIDB Q(DB)

27 27 Types of Routing Indexes uCompound uHop Count uExponential Decay uStrategies for Cycles  Ignore (for Hop-Count, exponential)  Avoid Update Cycles  Detect Update Cycles and Recover

28 28 Effect of Index Compression

29 29 Effect of Network Topology

30 30 Resource Management uResource:  storage (lockss)  CPU processing (seti@home)  bandwidth (PeerCast) uIssues:  fairness  load balancing

31 31 Example: Data Trading site 1 site 2 site 3 A1A1 B1B1 C1C1 A2A2 B2B2 C2C2 B1B1 A1A1 trade B2B2 A2A2

32 32 Example: Data Trading site 1 site 2 site 3 A1A1 B1B1 C1C1 A2A2 B2B2 C2C2 B1B1 A1A1 trade C1C1 A2A2 C2C2 B2B2

33 33 Data Trading uOrder of trades impacts reliability uIssues:  Swaps vs. Deeds  Fixed price vs. bids  Preference to sites with a lot of space? reliable sites? “desperate” sites?

34 34 Effect of Bid Policies bid more (ask more in return) when I have more free space bid more (ask more in return) when I have less free space

35 35 Effect of One Maverick Site always bids high

36 36 Security & Privacy uIssues:  Anonymity  Reputation  Accountability  Information Preservation  Information Quality  Trust  Denial of service attacks

37 37 Information Preservation uExample Policy: make 3 copies of documents A1A1 make copies What can go wrong?

38 38 What Can Go Wrong? u“Bad” sites make copies u“Bad” site alters copy u“Bad” site publishes fake u“Bad” site makes may copies of other docs u... A1A1 make copies A’1A’1 A1A1

39 39 Conclusion uP2P systems popular today uP2P systems vulnerable and inefficient uMany challenges ahead  Search  Resource Management  Security and Privacy

40 40 For Additional Information uGoogle: “Stanford Peers” uhttp://www-db.stanford.edu/peers/


Download ppt "1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002."

Similar presentations


Ads by Google