FreeNet: A Distributed Anonymous Information Storage and Retrieval System Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong
FreeNet P2P network for anonymous publishing and retrieval of data –Decentralized –Nodes collaborate in storage and routing –Data centric routing –Adapts to demands –Addresses privacy & availability concerns
Motivation Problem - Querying the network –Source - Requestor –Destination – Provider It’s a distributed search problem –Approximating global knowledge with local knowledge –Other systems – Chord, Tapestry, Pastry Privacy and availability – Protect authorship, prevent denial attacks
Goals of Freenet Anonymity for producers and consumers Deniability for information storers Resistance to denial attacks Efficient storing and routing Does NOT provide –Permanent file storage –Load balancing –Anonymity for general n/w usage
Architecture Each node – local data store + routing table Request file through location independent keys Routing - chain of proxy requests - decision is local Graph structure actively evolves over time Request: 1.key 2.Hops to live 3.ID 4.Depth
Key Based Searching FILE ‘D’– key generation Pb + Pr ; SHA(Pb) D + Pr KSK Encrypted FILE Signature E(FILE, D) Keyword signed key(KSK) Easy for retrieval – only need ‘D’ Minimal protection against tampering
Keys and Searching….. Problems with KSK – flat namespace (collisions), key squatting, dictionary attacks Signed Subspace Key (SSK) –Randomly generated key pair namespace ID –SSK = SHA(‘D’) ^ SHA(Pb) –(-)Advertisement – subspace Pb + ‘D’ –(+)Owner can construct hierarchical space of arbitrary depth - using indirect files –(+)Reduces collision greatly
Keys and Searching… Problems with SSK - updating, versioning Content Hash Keys (CHK) –Encrypted by a random encryption key –Publish CHK + decryption key –CHK + SSK easily updateable files 2 step process – publish file, publish pointer Results in pointers to newer version Older versions accessed thru CHK –Can be used for splitting files
Retrieving Files How do u locate the keys? –Hypertext spider –Indirect files – published with KSK of search words –Publish bookmarks File retrieval –Request forwarded to node in RT with closest lexicographic match for the binary key –Request routing follows steepest-ascent hill climbing: first choice failure backtrack second choice
Still Retrieving…. Timers, hops - curtail request threads Files cached all along the retrieval path Self-reinforcing cycle – results in key expertise c a d b e f
Ring Topology 1000 nodes in ring topology Datastore = 50 items RT = 250 items Keys associated with links are hash of destn IPs
Self Reinforced Routing Snapshots using 300 requests with hops = 500 As network converges it drops to 6 - “six degrees of separation”
Retrieval Discussion No controlled replication no persistence No correlation between keys and content –(+) Documents related to a subject are scattered Geographical fault resilience –(-) No spatial locality – search latencies can suffer Building indexes by other means
Publishing Similar to retrieval but, 2 step process –Detect collisions – ‘all clear’ if no collision –Publish to node in RT with closest key match Are CD and publish paths same? –Can result in collision during publish step Inserts allow new nodes to advertise themselves (+) Key-squatting is not effective
Data Management Finite data stores - nodes resort to LRU Routing table entries linger after data eviction Outdated (or unpopular) docs disappear automatically Bipartite eviction – short term policy –New files replace most recent files –Prevents established files being evicted by attacks
Network Growth New nodes have to know one or more guys Problem: How to consistently decide on what key the new node specializes in? –Needs to be consensus decision – else denial attacks Advertisement IP + H(random seed s0) –Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))……. –Key for new node = XOR of all seeds Each node adds a RT entry for the new node
Network Growth Key assigned to new nodes = H(IP) Scales as log(n) until n ~ At 40000, RTs are full
Protocol Nodes with frequently changing IPs use ARKs Return address specified in requests – threat? Messages do not always terminate when hops- to-live reaches 1 Depth is initialized by original requestor to arbitrarily small value Request state maintained at each node – timers - LRU
Fault Resilience Median path length < 20 at 30% node failures? N/w becomes ineffective at 40% failures ???
Small World Most nodes form local clusters Few high link connecting nodes Power law distribution provides high degree of fault tolerance
Security Concerns Pre- routing – mesg. encrypted by public keys which determine path of pre-routing Protecting data source – using random and probabilistic methods
Security File integrity - KSK vulnerable to dictionary attacks DOS attacks – Hash Cash to slow down Attempts to displace valid files are constrained by the insert procedure
Conclusion Provides a n/w to anonymously store and request files Adaptive routing who’s efficiency increases with experience Deals with privacy and data integrity in various scenarios Applications? –Freedom of speech –Unaccountable, decentralized Napster