Download presentation
Presentation is loading. Please wait.
1
Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang
2
Peer-To-Peer Systems Autonomous, large-scale, decentralized systems A large pool of resources Files, compute cycles Open performance and security challenges
3
Research problems Search Efficiency Expressiveness Quality of Service Security Availability Authenticity Anonymity Access Control
4
Search Mechanism Submit queries and receive results Keywords, SQL statements Defines the behavior of peers Topology How peers are connected to each other Data placement How data is distributed across the peers Message Routing How messages are propagated
5
System Requirements Expressiveness Query language should provide detailed description Key lookups not expressive enough Comprehensiveness Single result not sufficient for some systems All results required in some cases Autonomy Nodes should control their organization
6
Goals of Search Mechanism Maximize efficiency Light overhead, higher throughput Maximize Quality of Service Number of results Response time Robustness Stability in presence of failures
7
Expressiveness (1/2) Key lookup Keyword queries Partial search Efficient for certain types of file, e.g music Ranked Keyword Rank the results of keyword queries Global statistics required Collection and maintenance challenging “top k” results
8
Expressiveness (2/2) Aggregates SUM, COUNT, MAX and MEDIAN E.g. COUNT nodes belonging to forth.gr domain SQL The most difficult query language Performance “hotspots” (PIER system)
9
Autonomy/ Efficiency/ Robustness Correlation between autonomy and efficiency Locate data with bounded cost (Chord) Small sets of nodes guaranteed to hold the answer Increased chance of finding results on random node
10
Tuning the autonomy / efficiency tradeoff Varying needs E.g. sensitive files should remain on the intranet Different systems for different purposes not always desirable SkipNet Specify a range of peers on which a document can be stored Single peer range: high autonomy All peers range: traditional P2P system
11
Autonomy and Robustness Viceroy network construction Low level of autonomy Reduced cost of maintaining structure => Increased robustness and efficiency Distributed hash tables Logarithmic maintenance cost Super-peer redundancy Stricter topology => decreased autonomy => greater robustness
12
Quality of Service Number of results Tradeoff between number of results and cost BFS technique Send messages to “productive” nodes Depends on ad-hoc topology Concept-clustering Communicate according to “interest” “Satisfaction” True when a threshold of results found Important to partial-search systems Cost can be drastically reduced
13
Security Availability Bandwidth, CPU and file availability File Authenticity Which responses are authentic? Anonymity How we can hide our identity? Access Control Restrict accessibility
14
Availability Nodes should be always up DoS attacks Flooding a node with messages Malicious super-nodes in Gnutella Claims that the victim has all files requested Attack CPU availability Sending complex queries Attack file storage Submit bogus documents Attack quality-of-service Serve a file slowly Send a different file
15
Countermeasures Careful design of P2P protocols Gnutella is loosely constrained Back-door communication channels are prohibited Techniques for detecting failures High message overhead, complexity Assume pairwise connectivity Allocate storage proportionally to what a node contributes Hash trees to ensure a node is sending the correct data and at a reasonable rate
16
Security Availability Bandwidth, CPU and file availability File Authenticity Which responses are authentic? Anonymity How we can hide our identity? Access Control Restrict accessibility
17
File Authenticity Different than file integrity CRC, hashing, MACs, digital signatures Given a query, the authentic response has to be distinguished What does “authentic” mean?
18
Definition of “authentic” Oldest Document The oldest submission is consider authentic Timestamping systems Expert-based Authoriative nodes keep track of signatures Susceptible to failures Offline digital signature schemes Voting-based Votes of many experts Experts may be humans Spoofing of votes, nodes and files Reputation-based Weight votes, some experts more trustworthy Maintenance, update and propagation of weights
19
Security Availability Bandwidth, CPU and file availability File Authenticity Which responses are authentic? Anonymity How we can hide our identity? Access Control Restrict accessibility
20
Anonymity (1/2) Illegal trade of files vs. censorship resistance, freedom of speech, privacy protection Types of anonymity Author: which users created which documents Server: which nodes store a given document Reader: which users access which documents Document: which documents are stored at a given node Anonymity vs. efficiency Free Haven provides server anonymity, Freenet provides author anonymity
21
Anonymity (2/2) Achieve server anonymity through intermediate nodes Forwarding proxies Servers identified by nicknames Degradation of anonymity protocols under attacks Problem of collusion Free Haven and Crowds use forwarding proxies
22
Security Availability Bandwidth, CPU and file availability File Authenticity Which responses are authentic? Anonymity How we can hide our identity? Access Control Restrict accessibility
23
Access Control Restrict accessibility to documents P2P systems cannot enforce copyright laws Violation of copyright laws by users Lawsuits against companies that build P2P systems Limited utilization vs. free distribution
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.