Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides.

Similar presentations


Presentation on theme: "1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides."— Presentation transcript:

1 1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides developed by Beverley Yang and Hector Garcia-Molina

2 ICS214BNotes 182 What is P2P? napster gnutella morpheus kazaa bearshare seti@home folding@home ebay limewire icq fiorana mojo nation jxta united devices open cola uddi process tree can chord ocean store farsite pastry tapestry ? grove netmeeting freenet popular power aim jabber

3 ICS214BNotes 183 Napster central server join query response get file...

4 ICS214BNotes 184 Gnutella query

5 ICS214BNotes 185 PeerCast UCI source UCI source after: before:

6 ICS214BNotes 186 What is Peer-to-Peer? Definition: Nodes of equal roles exchanging information and services directly Is this a new idea? –IP routing (1970’s) –Mariposa (1980’s) –Distributed Databases! What are people really thinking?

7 ICS214BNotes 187 Implicit Definition of P2P Scale: millions (billions?) of peers Nature of peers: PC’s Application: lightweight semantics (e.g., file-sharing)

8 ICS214BNotes 188 P2P vs. Distributed DBMS Traditional DDBMS Issues: Transactions Network Partitions Distributed Query Optimization Interoperation of heterogeneous data sources Reliability/failure of nodes Complex features do not scale

9 ICS214BNotes 189 P2P vs. Distributed DBMS Example application: file-sharing Simple data model and query language –No complex query optimization –Easy interoperation No guarantee on quality of results –Individual site availability unimportant Local updates –No transactions –Network partitions OK Simple Amenable to large-scale network of PCs

10 ICS214BNotes 1810 Potential Benefits Efficiency: harnessing unused resources Self-organizing Effectively sharing cost of ownership Robustness and availability through replication Anonymity/legal protection

11 ICS214BNotes 1811 Challenges No authority to enforce behavior Cooperation Unreliability of individual peers Efficiency of distributed operations (absolute resources)

12 ICS214BNotes 1812 Research Areas Resource Management Security Efficient Search

13 ICS214BNotes 1813 Resource Management Resource: –Storage/information –CPU processing –bandwidth Issues: –fairness –load balancing

14 ICS214BNotes 1814 Example: Data Trading site 1 site 2 site 3 A1A1 B1B1 C1C1 A2A2 B2B2 C2C2 B1B1 A1A1 trade B2B2 A2A2

15 ICS214BNotes 1815 Example: Data Trading site 1 site 2 site 3 A1A1 B1B1 C1C1 A2A2 B2B2 C2C2 B1B1 A1A1 trade C1C1 A2A2 C2C2 B2B2

16 ICS214BNotes 1816 Data Trading Order of trades impacts availability Issues: –Swaps vs. Deeds –Fixed price vs. bids –Preference to  sites with a lot of space?  reliable sites?  “desperate” sites?

17 ICS214BNotes 1817 Security Issues: –Reputation –Trust –Accountability –Information Preservation –Information Quality –Denial of service attacks Problem: Detecting and punishing bad behavior

18 ICS214BNotes 1818 Information Preservation Example Policy: make 3 copies of documents A1A1 make copies What can go wrong?

19 ICS214BNotes 1819 What Can Go Wrong? “Bad” sites deletes copies “Bad” site alters copy “Bad” site publishes fake “Bad” site makes many copies at other sites... A1A1 make copies A’1A’1 A1A1

20 ICS214BNotes 1820 Reputation Systems Peers evaluate each other Good reviews -> Good reputation Bad reviews -> Bad reputation No reviews -> ? Problems –Trustworthiness of reviews –Permanence of identity

21 ICS214BNotes 1821 Efficiency of Search Problem: finding needle in haystack Efficiency measured in terms of absolute resources consumed

22 ICS214BNotes 1822 Architecture Hybrid –Centralized index, P2P file storage and transfer Super-peer –A “pure” network of “hybrid” clusters Pure –functionality completely distributed

23 ICS214BNotes 1823 Goal Develop search techniques for “loose” systems that are Efficient Simple (easy to implement, no hidden costs) Realistically and thoroughly evaluated

24 ICS214BNotes 1824 Current Techniques: Gnutella = forward query = processed query = source = found result = forward response Breadth-First Search (BFS)

25 ICS214BNotes 1825 Metrics Cost (aggregate) –Bandwidth –Processing Power Quality of Results –Number of results –Satisfaction (true if # results >= X, false otherwise) –Time to satisfaction

26 ICS214BNotes 1826 Iterative Deepening Interested in satisfaction, not # of results BFS returns “too many” results  expensive Iterative Deepening: common technique to reduce the cost of BFS –Intuition: A search at a small depth is much cheaper than at a larger depth

27 ICS214BNotes 1827 Iterative Deepening = source = forward query = processed query = found result = forward response ?

28 ICS214BNotes 1828 Directed BFS Sends query to a subset of neighbors Maintains statistics on neighbors –E.g., ping latency, history of number of results Chooses subset intelligently (via heuristics), to maximize quality of results –E.g., Neighbors with shortest message queue, since long message queue implies neighbor is saturated/dead

29 ICS214BNotes 1829 Directed BFS = source = forward query = processed query = found result = forward response ?

30 ICS214BNotes 1830 Directed BFS: Heuristics RAND(Random) RESReturned greatest # results in past TIMEHad shorted avg. time to satisfaction in past HOPSHad smallest avg. # hops for response messages in past MSGSent our client greatest # of messages QLENShortest message queue DEGHighest degree

31 ICS214BNotes 1831 Local Indices Each node maintains index over other nodes’ collections –r is the radius of the index –Index covers all nodes within r hops away Can process query at fewer nodes, but get just as many results back r

32 ICS214BNotes 1832 sdf nrd sdf nrd sdf nrd Local Indices (r=1) = source = forward query = processed query = found result = forward response sdf nrd

33 ICS214BNotes 1833 Evaluation Goal: realistic evaluation of techniques Cannot directly evaluate techniques in a real environment Simulation of large-scale distributed systems is hard Use Gnutella as a “laboratory” for gathering data Use analysis driven by query “traces” to project cost

34 ICS214BNotes 1834 Passive Observation 1.Statistics Size of collection % redundant messages 2.Sample queries (Q rep ) Gnutella Network Pong Query

35 ICS214BNotes 1835 Gathering Data # hops traveled IP address Timestamp Individual result records # hops traveled IP address Query (Q rep ) Ping

36 ICS214BNotes 1836 Gathering Data For each query Q: L(Q)Length of query string M(Q,n)# response messages from n hops away R(Q,n)# results from n hops away S(Q,n,Z)True if >= Z results received from n hops away T(Q,Z,W,P)Time to satisfaction N(Q,n)# nodes n hops away C(Q,n)# redundant edges n hops away

37 ICS214BNotes 1837 Example: Trace-driven Cost Projection = source = forward query = processed query = found result = forward response ?

38 ICS214BNotes 1838 Example: Calculating Message Size Use the Gnutella protocol, trace data e.g., Query message consists of: –Gnutella header (22 B) –Options field (2 B) –Query string (L(Q)) –TCP/IP and Ethernet headers (58 B) Total size of Query message for query Q: 82 + L(Q) bytes

39 ICS214BNotes 1839 Calculating Cost We know the sizes of each type of message We know # messages sent, for each type of message, for query Q Put together: aggregate bandwidth for Q Similar process to compute aggregate processing power

40 ICS214BNotes 1840 Overall Comparison Time to Satisfy Prob. of Satisfying # results Bandwidth Cost BFS Iterative Deepening (d=5,W=6) Directed BFS (>RES) Local Indices (r=1) B I D B B B B I I I I D D D D L L L L L

41 ICS214BNotes 1841 Summary: Efficient Search What we’ve done: –Proposed techniques to improve performance  Kept simple –Evaluated techniques using extensive real data –Improved performance, with tradeoffs Open issues: –More efficient! –Make intelligent use of topology, replication –Take advantage of heterogeneity (e.g., super- peers)


Download ppt "1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides."

Similar presentations


Ads by Google