Security Considerations for Structured p2p Peng Wang 6/04/2003
Outline 1. p2p Networks for file sharing, Napster & Gnutella 2. Chord, Pastry, Tapestry & CAN 3. Security considerations 4. References
P2P Networks for file sharing p2p networks for file sharing involve two phases: 1. find out the peer storing the requested file. 2. Download the requested file from the exporting peer. Architectures Centralized – Napster Unstructured decentralized – Gnutella Structured decentralized – Chord, etc
Napster A central server stores a “index table”. (file name, IP address) pairs The index table contains location info of all the files available within the Napster user community To retrieve a file, a initiator queries this central server using the name of the desired file, and obtains the IP address of the supplier storing that file. The file is downloaded directly from this supplier.
Napster con’t Napster BAX … Napster.com Join: upload a file list Query the centralized server. Download the file from a peer
Napster con’t Napster uses a p2p communication model for the actual file transfer. The process of locating a file is still centralized. To decentralize this process, Every node store its own list -- Unstructured break down the index table into small pieces. Each peer stores a piece of the table -- Structured
P P: a node looking for a file O: offerer of the file Query QueryHit Download O Match O Unstructured: Gnutella
retrieve (K 1 ) K V Structured p2p How to find the peer which stores the piece containing the location info of the desired file?
Chord, Pastry, Tapestry & CAN The lookup protocol maps a desired key (hash value of file name) to the IP address of the node responsible for that key. A storage protocol layered on top of the lookup protocol then takes care of storing, replicating, caching, retrieving, and authenticating the files.
Each node and each key have a m-bit ID. the ID space is 2 m. Chord arranges the ID space on a circle. Uses SHA-1 hash function to assign ID. Key ID = SHA-1(key) Node ID = SHA-1(IP address) successor(k): the node with ID equal to or follows a key’s ID k in the ID space. File -- (k, IP address) pair (k, IP address) pair of a file is stored at successor(k). Chord
2 0 An example of ID circle (space) Successor(2)=3 Successor(1)=1 Successor(6)=0 m=3 3 keys: 1,2,6 3 nodes: 0,1,3 2 6 a node a file whose ID is 2 2
finger tables To route messages, each node n maintains a finger table with m entries, which store nodeIDs & IPs i th entry in the table at node n contains the identity of the first node s, that succeeds n by at least 2 i-1 on ID circle. s = successor(n + 2 i-1 ), 1 i m Distance increase exponentially startfingers 2=(1+2 0) 3 (=successor(2)) 3=(1+2 1 )3 (=successor(3)) 5=(1+2 2 )0 (=successor(5)) e.g. finger table of node 1
finger table example
routing Given k, to find the (k, IP address) pair of k. find immediate predecessor p of k, with help from other nodes. p’s successor is the successor of k. the successor of k has the (k, IP address) pair. e.g. node 1 wants to find a key 6 finds out node 3 is a predecessor Sends a message to node 3 Node 3 knows it’s the immediate predecessor of key 6 Node 3 sends a message to its successor 0. 0 sends (6, IP) back to node 1. Node 1 wants 6 (k, IP)
Join A new node N first finds the IP address of any node n’ currently in the system. external mechanisms Initialize fingers of node n asking n’ to look them up, n+2 i-1, 1 i m. Update the fingers of existing nodes each node maintains a predecessor pointer n will become the i th finger of p, if p precedes n by at least 2 i-1 The i th finger of node p succeeds n. Transfer keys from n’s successor to n
Finger tables after node 6 joining
Leave is a reverse process of join Update the fingers of other nodes Transfer keys
Failures and replications Key step: maintaining correct successor pointers Each chord node maintains a successor-list There are r immediate successors on the list After failure, a node will know first live successor Correct successors guarantee correct lookups The (key, IP address) pair database of the Failed node is lost. Replications of (key, IP address) pair databases. Periodical refresh messages sent by supplier (file holder) to handle stale, lost pairs
Pastry Each node and each key have a m-bit ID. the ID space is 2 m. E.g. m=128, Uses md5 to assign ID. Pastry arranges the ID space on a circle. Routes messages to the node whose nodeID is numerically closest to the given key. Prefix routing: forward message to a node whose nodeID shares longer prefix with the given key comparing the present node. If no such node, forward to a numerically closer node.
Pastry node state IDs are thought of as a sequence of digits with base 2 b. E.g. b=2 Leaf set contains the numerically closest nodes. It is similar to successor list Routing table has m/b rows with 2 b entries each. The 2 b entries at row i each refer to a node whose nodeId shares the present node’s nodeId in the first i digits, but whose i+1th digit is the column number.
Routing When a message with key D arrives If D is in the range of leaf set, forward it to the nearest one. Or forward it to a node shares longer prefix with D according to the Routing table. Otherwise, forward it to any node who is numerically closer to D. e.g. route message with to node
Join & failure Join : Bootstrap Initialize its state tables Inform other nodes in its state tables Failure: Lazily repair Leaf set Routing table
Similarity Pastry: b=2 0 2 m-1
Tapestry & CAN P Tapestry is very similar to Pastry. CAN:
Design improvements Reduce the latency of routing, increase system robustness Increase per-node state and system complexity e.g. Multiple hash functions use k different hash functions a single file is mapped onto k points in the space and corresponding (hash value, IP) pairs are stored at k distinct nodes k replicas, parallel routing, but more query traffic
Security Considerations Assumption of attackers Node ID Message forwarding DoS Rapid Joins and Leaves Forged routing table update Inconsistent behavior
Assumption of Attackers Participants who do not follow the protocol correctly Provide false information, forge source IP address Modify, drop message passing them May conspire together Cannot overhear or modify direct communication between other nodes
Attacks related to NodeID Join attack Where a node is located on the ID circle depends on its nodeID. If an attacker can choose its nodeID, then she can Control victim node’s access to the p2p network Control other nodes accessing victim file. Partition the network?
Attackers may control victim node’s (e.g. V) access to the network. Attacker may control the access of a file. Pastry? V+2 m-1 V V+2 m-2 V+2 m-3 V+2 m-4 Illustration F F a victim node a victim file an attacker node
Secure NodeID generation Prevent attackers from choosing joining point as they want !!! Certified nodeID Uses a central, trusted authority (CA) CA chooses nodeID randomly from the ID space and signs nodeID certificate. E.g. CA chooses a public key randomly for the joining node. The nodeID is hash value of its public key PK. CA’s PK can be installed as part of the p2p software
Sybil attacks Attackers cannot choose nodeIDs or joining points But they can obtain a large number of legitimate nodeIDs easily. Attacker may control large amount of nodes crypto puzzles ? Uses a trusted Authorities (CA) to guarantee the uniqueness: bind nodeID to real word ID, etc to slow down attackers: Entrance fee, etc. Can a fully decentralized nodeID assignment scheme solve this problem ?
Attacks on message routing Assume secure nodeID assignment Attacker is the destination or on the path: E.g. on average, a Chord message is routed by (log 2 N )/2 nodes to reach destination. -- 1,000,000 nodes, (log 2 N )/2= % nodes are controlled by attackers -- P(meet a corrupted node)= =65% Drops, modifies or sends wrong answer back Detect –check responder’s ID and signature ? Non-deterministic routing – change route ? Multi hash functions – replicas Attacker cannot control all replicas
DoS Attacker generates huge amount of query messages. Victim node can not serve other nodes Incoming Allocation Strategy Assumption: Secure nodeID assignment, each node has a list of incoming messages senders Processor scheduling strategy E.g. Round-Robin (RR) scheduling Check senders’ Ids Rapid Joins and Leaves trivial to prevent with Secure nodeID assignment
Other attacks Forged routing table update ? Inconsistent behavior ? We have to have Secure NodeID generation & Multi hash functions !!!
1. Napster The Gnutella Protocol Specification v0.4, www9.limewire.com/developer/gnutella_protocol_0.4.pdf 3. Sit, E. and Morris R. Security Considerations for Peer-to-Peer Distributed Hash Tables March John R. Douceur. The Sybil attack. March Neil Daswani and Hector Garcia-Molina Query-flood DoS attacks in Gnutella November A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer- to-peer systems". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages , November, M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S. Wallach, "Security for structured peer-to-peer overlay networks". In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI'02), Boston, MA, December Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM 2001, San Deigo, CA, August 2001, pp Dan S. Wallach, A Survey of Peer-to-Peer Security Issues, International Symposium on Software Security (Tokyo, Japan), November Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable Content- Addressable Network, ACM SIGCOMM Ben Y. Zhao, John Kubiatowicz, Anthony D. Joseph Tapestry: An Infrastructure for Fault-tolerant Wide- area Location and Routing, UC Berkeley References