1 Security and Trust in P2P systems
2 What is trust When thinking about security in a system, various entities need to “trust” others to varying degrees So… what is trust Trust is a bet about the future contingent actions of others
Trust and Security Direct validation I need to know whether I can “trust” another entity within this system Authentication Indirect validation Should I trust “Alice” because my friend, Bob, trusts her? Recommendation Reputation 3
4 Trust and Security The “perfect” P2P system A system with perfectly flat hierarchy, and with each entity allowing other entities to use local resources How can we provide security without a centralized entity?
5 Malicious node A malicious node might give erroneous responses to a request Application level Returning false data Network level Returning false routes May work together, acting in concert, to attack the remainder of the nodes
6 Issues Identification Routing table risk Victim Data Victim Peer Content verification Punishment
7 Identification Identity Undesirable to know the identity of other entities Privacy ( _45792_p1.html) _45792_p1.html Anonymity However, If you wish to trust entity A, you need to be able to identify it
8 Identification Public key infrastructures (PKI) Should be run with somebody! For a PKI to work in this sort of situation, you need to have a trusted third party Recommendation systems Chains of trust Transitive trust
9 Identification When trust must be transitive, it creates brittleness In most P2P system, transitive trust is a key component How to measure “reputation” Roles Time related
10 Secure Routing in p2p systems Security routing primitive ensures that when a non-faulty nodes sends a message to a key k, the message reaches all non-faulty members in the set of replica roots R k with very high probability Security routing guarantees that a replicas are initially placed on legitimate replica roots, and that a lookup message reaches a replica if one exists
11 Three problems Securely assigning nodeIds to nodes Ensure attackers cannot choose the value of nodeIDs Securely maintaining the routing tables Ensure that the fraction of faulty nodes that appear in the routing tables of correct nodes does not exceed the fraction of faulty nodes in the entire overlay Securely forwarding messages Ensure that at least one copy of a message sent to a key reaches each correct replica root for the key with high probability
12 Secure nodeId assignment A node might choose its identifier maliciously Allocate itself a collection of nodeIds closer to that document’s key than any existing nodes in the system (Victim Item) Censor a specific document Choose nodeIds to maximize its chances of appearing in a victim node’s routing tables (Victim Peer)
13 Secure nodeId assignment Centralized authority The server is only consulted when new nodes join and is otherwise uninvolved in the actions of the p2p system Sybil attacks Coalition nodes might try to get a large number of nodeIds Even if those nodeIds are random, a large enough collection of them would still give the attackers disproportionate control over the network Moderate the rate at which nodeIds are given out Charging money? By solving little problem?
14 Robust routing primitives If an attacker controls a fraction f of the nodes in the p2p network, we would expect that each entry in every routing table would have a probability of f of pointing to a malicious node. If a desired route consumes h hops The probability being free of malicious nodes is (1-f) h How about Chord with 2 m nodes?`
15 Robust routing primitives Attempt multiple, redundant routes from the source to the destination Costly How to determine “Not found”
16 Content verification Adversary may spoof the results Verification can be done if we have verification codes Solve by Google’s PageRank technology Pages that are linked from “popular” pages are themselves more popular How to add such a notion of popularity into a p2p system
17 Punishment Remove malicious nodes when they are detected How to detect malicious nodes? Can we have a global view, who can punish the misbehave nodes?
Sybil Attack “Sybil” (1973) by Flora Rheta Schreiber Attacker creates multiple identities to control a large portion of the network
Identity Validation John R. Douceur, The Sybil Attack, in Proceedings of 1st International Workshop on Peer-to-Peer Systems (IPTPS), 2002The Sybil Attack How does an entity know that two identities come from different entities? Four Lemmas “prove” that Sybil attacks are always possible without centralized authority Direct validation (lemmas 1 & 2) Indirect validation (lemmas 3 & 4)
Lemma 1 Because entities are heterogeneous in terms of capabilities, a malicious entity can create several “minimal” identities Lower-bound on number of identities
Lemma 2 Each correct entity must simultaneously validate all the identities it is presented, otherwise, a faulty entity can counterfeit an unbounded number of identities Simultaneous identity verification not practical
Lemma 3 If a certain number of identities must vouch for a new identity for it to be accepted, then a set of compromised identities can create any number of new fake identities A sufficient large set of faulty entities can counterfeit an unbounded number of identities
Lemma 4 All entities in the system must perform their identity validations concurrently; otherwise, a faulty entity can counterfeit a constant number of multiple identities. Again, simultaneous validation is difficult in real- world networks.
Overview Conclusion Networks require centralized authority to validate network identities Without one, Sybil attacks are always a possibility
Mission If it is hard to avoid, can we limit it? Idea Moderate the rate at which nodeIds are given out Charging money? By solving little problem? 25
26 Admission control system (ACS) Property Security Provide resiliency against Efficiency Should be simple and does not require a lot of overhead on participating nodes Fairness Nodes should do an equal amount of work to join the network Response to attack Make the attack more difficult while not affecting other legitimate nodes Scalability
27 It is important that the upper layer nodes are both static and trustworthy A must gain admission from a sequence of nodes, starting with leaf node B and ending with root X At each stage, A is required to solve a puzzle presented by B Decentralized, multi-puzzle scheme
28 Join protocol Get token A wishes to join the network, it must first discover a leaf node B To gain admission from B by solving B’s puzzle After solving the puzzle, it is given a token and is used to prove to B’s parent admission by B At each stage, A is given a token to be used as proof of previous puzzle solution. When reach the root, a final token format is issued by X A’s signature
29 Connect to the network A must prove to its prospective neighbors that it has been admitted by the root node X Signature verification is cost The neighboring nodes each require A to solve one more puzzle challenges protect neighbors from a DoS attack
30 Node Upgrade A must prove its stability before inclusion in the ACS Initially, A joins the ACS as a leaf node, and evaluated by its parent node To maintain a balanced tree A node only upgrades nodes when its number of children has reached the degree of the tree When it is sufficiently deep to support the join load and achieve the proper security guarantees, no node will be added in the ACS
31 Node departure Not a member of ACS A member of ACS Leave gracefully The oldest child is chosen to replace the departing node Due to a failure Children must rejoin the network by Contact its grandparent Or, find another node in the ACS
32 Security The ACS is designed to limit Sybil attacks, not to prevent them! Attacker is a member of ACS Easily detected by the parent of the attacker by observing the rate of the token requests Attacker is not a member of ACS Control a significant fraction of nodes Attack is limited by ensuring only a small number of tokens are released during a period of time
33 How about patient attackers? If an attacker is patient enough, it can achieve the required number of IDs to launch a massive attack Cut-off window Define a token expiration time, W How to determine the value of W Limit the number of good users that must execute the rejoin process to a small percentage
Startup The basic protocol provides minimum protection of the network during the startup process when it has small number of nodes An attacker can obtain a large percentage of nodes in a shorter time For example, if the network has 36 nodes, an attacker needs to obtain 4 nodes to be in control of 10% of all the nodes. If we assume that it takes 5 minutes to get an ID, the 10% target can be achieved in less than 20 minutes. 34
Startup (method1) Make the puzzles at the starting phase very difficult, and then decrease the difficulty linearly as nodes join. For example, if the initial puzzle takes an average of two hours to be solved, then after one node joins the puzzle difficulty is reduced to 1 hour and 50 minutes. Network initialization time will be high! 35
Startup(method 2) Define a start up window that impacts the joining process for a finite time. Puzzle difficulty in this scheme decay over time As opposed to the above scheme which reduces the puzzle difficult as the number of nodes grow. For example nodes joining the network at its inception are given puzzles that take two hours to solve. nodes that join five minutes after inception are given puzzles that take 1 hour and 55 minutes to solve. This continues until we reach the puzzle difficulty targeted for the normal join process. 36
Startup(method 2) The number of node IDs an attacker may obtain during this start up window depends on the arrival rate of the nodes how much more powerful the attacker is compared to the average user much shorter network initialization time compared above scheme 37
38 Analysis Models Legitimate nodes arrive according to a Poisson distribution with an arrival rate of g Life time is exponentially distributed with mean of g Assume an attacker is equal in computational power to the average user l: Joining difficulty (measured in maximum time)
39 Analysis Puzzles and fairness The distribution of the time to solve the puzzle is uniform Single puzzle of average time l / 2 n puzzles of difficulty l/n Example 5 mins to solve with a maximum standard deviation of 30 seconds 9 puzzles and each takes max 33.3 seconds.
40 Analysis Steady state The number of nodes in the network, N N= g * g To control fraction f of nodes, an attacker will be required to obtain (f/(1-f))*N IDs Assume there are n attackers Arrival rate of attacker nodes will be a = n / l The time to launch a successful attack
41 Analysis Example If λ g = 1 node/sec, and µ g = 2.3 hours, the steady state number of nodes is 8280 For the attacker to control 10% of the total nodes in the network it is required to obtain 920 IDs If the joining process takes on average 5 minutes, a successful attack would take 76 hours which is more than 3 days.
42 Analysis Cut-off windows (legitimate nodes) P : the percentage of legitimate nodes that will be required to reacquire fresh tokens
43 Analysis Example If µ g = 2.3 hours and W = 4 hours, The percentage of Legitimate nodes that will be cut off the network and asked to rejoin is 17.5%.
44 Analysis Cut-off window (attackers) The combined number of nodes of n attackers can accumulate is n*W / l Example If the average join time is 5 minutes and W = 4 hours The maximum number of nodes an attacker can accumulate is 48 nodes
45 Conclusions and Discussions What we learn Topologies Centralized p2p system Search cost is bounded Single point of failure Decentralized p2p system Unstructured p2p system Flexible Unbounded search Structured p2p system Scalibility, bounded search Only support keyword query Super peer architecture
46 Conclusions and Discussions Search Constraint of hash Dimension reduction and Document retrieval Absolute angle Rolling index Locality preserving hashing idistance Application BT For efficiency downloading Tit for tat Skype Super peer architecture Security ACS
47 Conclusions and Discussions A better topologies? Robustness Scalibility Flexible Bounded search Fairness Etc.
48 Conclusions and Discussions Support general query? The constraint of hash Similarity search Range query Content-based retrieval Trust without a third party? nodeId assignment Routing table management Content management How to decide the score?