Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-To-Peer Data Management

Similar presentations


Presentation on theme: "Peer-To-Peer Data Management"— Presentation transcript:

1 Peer-To-Peer Data Management
Hector Garcia-Molina ICDE Conference, February 28, 2002

2 ? What is P2P? pastry can jxta fiorana napster freenet united devices
open cola ? aim ocean store netmeeting gnutella farsite icq maorpheus ebay limewire bearshare uddi grove jabber popular power kazaa tapestry process tree mojo nation chord

3 Napster join query answer get file central index ...

4 Gnutella query

5 Morpheus ... ... ... ... super peer ... ...

6 satellite dish raw data chunk analyzed data central site ...

7 Lockss D3 D1 library D library A D2 library C library B library E

8 PeerCast Stanford source after: before: Stanford source

9 What is a P2P System? Multiple sites (at edge) Distributed resources
Sites are autonomous (different owners) Sites are both clients and servers Sites have equal functionality P2P Purity

10 P2P is BAD IDEA!! Distribution is expensive!
Specialized functionality is good!

11 Example: Distributed Data Management
Distribution is expensive If you must distribute: build centralized directory, index use backups for reliability for replicated data, use primary copy

12 Computational Efficiency is NOT Main Goal
Main driving force in a P2P system: exploiting existing (often free) resources sharing costs among many legal protection autonomy anonymity

13 Should We Do P2P Research?
Should we help people break the law? Analogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes, ... ??

14 Should We Do P2P Research?
YES: P2P not exclusively for breaking law Remember the VCR YES: P2P can liberate us from culture “plantation owners” (Lessig)

15 Is “Free Culture’’ Feasible?
Example: Legal texts Can we afford it? economic activity rules of the game today

16 Should DB community work on P2P?
YES

17 P2P Challenges Easier to list NON-Research-Topics:
Color schemes for P2P Nodes Impact of P2P on Moroccan 15th Century Literature

18 P2P Challenges Search Resource Management Security & Privacy

19 Search Taxonomy lookup freenet can partial replicated SP
content queries search gnutella morpheus napster routing single site regional global scope of index

20 Index Implementation Taxonomy
routing replicated SP freenet yes gnutella morpheus index location correlated with content location partial no napster can centralized distributed P2P nature of index

21 Content Addressable Network (CAN)
Nodes 1 Data 2

22 Can We Improve Flooding?
routing replicated SP freenet yes gnutella morpheus index location correlated with content location partial no napster can centralized distributed P2P nature of index

23 Directed BFS in Gnutella
? ... query Heuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree

24 How Does One Evaluate? Live Gnutella?
Use real Gnutella as “laboratory”

25 Time to Satisfaction for Directed BFS

26 Routing Index C Q(DB) A B D 50 25 C AI DB 20 65 B 70 75 50 90 20 A AI
A AI DB 50 B AI DB D 15 D 20 A 50 25 C 15 D AI DB

27 Types of Routing Indexes
Compound Hop Count Exponential Decay Strategies for Cycles Ignore (for Hop-Count, exponential) Avoid Update Cycles Detect Update Cycles and Recover

28 Effect of Index Compression

29 Effect of Network Topology

30 Resource Management Resource: Issues: storage (lockss)
CPU processing bandwidth (PeerCast) Issues: fairness load balancing

31 A1 B1 C1 A2 B2 C2 B1 A1 B2 A2 Example: Data Trading site 1 site 2
trade B2 A2 trade

32 A1 B1 C1 A2 B2 C2 B1 A1 C1 A2 C2 B2 Example: Data Trading site 1
trade C1 A2 trade C2 B2 trade

33 Data Trading Order of trades impacts reliability Issues:
Swaps vs. Deeds Fixed price vs. bids Preference to sites with a lot of space? reliable sites? “desperate” sites?

34 Effect of Bid Policies bid more (ask more in return)
when I have less free space bid more (ask more in return) when I have more free space

35 Effect of One Maverick Site
always bids high

36 Security & Privacy Issues: Anonymity Reputation Accountability
Information Preservation Information Quality Trust Denial of service attacks

37 Information Preservation
Example Policy: make 3 copies of documents A1 make copies What can go wrong?

38 A1 A1 A’1 What Can Go Wrong? “Bad” sites make copies
“Bad” site alters copy “Bad” site publishes fake “Bad” site makes may copies of other docs ... A1 A1 make copies A’1

39 Conclusion P2P systems popular today
P2P systems vulnerable and inefficient Many challenges ahead Search Resource Management Security and Privacy

40 For Additional Information
Google: “Stanford Peers”


Download ppt "Peer-To-Peer Data Management"

Similar presentations


Ads by Google