Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis.

Similar presentations


Presentation on theme: "Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis."— Presentation transcript:

1 Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis

2 Why P2P-Networks ?  Decentralisation  No single point of failure  No content-control  Distribution of content, computing power, bandwith

3 4 hops3 hops2 hops1 hop0 hops Querying in P2P-networks Dirk Nowitzki TTL:

4 Idea Semantic Overlay Networks Querying in unstructured P2P-networks message flooding with TimeToLive many redundant messages Group peers according to their content Querying in Semantic Overlay Network (SON) only ask all nodes for specific content field

5 Querying in a SON Dirk Nowitzki basketball flowers geology computer science

6 How to build a SON  Contact other peer P  If( isFriend(P) )  Add P in list of friends  Add P‘s friends in list of candidates  isFriend(P) judged by  How high is the similarity?  How small is the overlap?  How well did P cooperate? Dating

7 Process of P2P-dating  peer to send to chosen from 3 lists: friends, candidates, random  send check-alive message to friends  send contact message to candidates and random peers  receive synopses of collections and compute scores  friend and candidate lists have fixed lengths  Add until full then drop worst peers

8 Search in SON  peer P sends queries to peers with similar interest profile, i.e. all friends  Each peer only sends his top-k results back  When all answers have arrived P merges results, removes duplicates and delivers top-k results

9 Strategies for scores  Similarity Only:  Overlap Only:  Weighted Sum:  Random: no Score computed Similarity(A,B) 0 = the same >0 until ∞ : differs

10 Overlap Measure Minwise Independent Permutations measure the overlap with formula: = hashs of documents

11 Similarity Measure Kullback Leibler Divergence/ Relative Entropy Similarity(A,B) 0 = the same >0 until ∞ : differs

12 PASTRY: network infrastructure  Distributed Hash Table  maps keys to peers currently responsible for that key  MINERVA uses PASTRY  O( log(N) ) hops for any message to reach any destination

13 Local Collections  Index file saved on hard disk  LUCENE Index is an Inverted Index for terms occuring in websites obtained by  user – with surfing (e.g. by a plugin)  crawler on bookmarks  Allows additions and deletions

14 Experimental Setup  NUTCH was used as crawler  Seeds: 14-16 start URL‘s on a certain topic from del.icio.us and dmoz.org  Depth: 2 each peer ~400 pages peer 1-4 Basketball peer 5-7 Computer Science peer 8-10 Flowers peer 11-12 Geology  Queries for peer 1: „playoffs“, „Dirk Nowitzki“  Queries for peer 7: „thesis“  Queries for peer 12: „earth science“

15 Chart 1  Comparision for 75 Iterations between - 5 random peers - and p2pdating for 5 friends with weighted sum strategy, alpha=0.8  y-axis: recall x-axis: iterations in steps of 5

16 Chart 1

17 Chart 2  Comparision for 50 Iterations between - random peers asked - and p2pdating for x friends with weighted sum strategy, alpha=0.8  y-axis: recall x-axis: #peers asked

18 Chart 2

19 Conclusion  Use of PASTRY as underlying routing/networking infrastructure  Implementation of details of peer-to-peer network, p2pdating algorithm  Messages-handling several message types protocol for sending and receiving messages  Adaption of NUTCH to crawling  Use of LUCENE to query indexes  Experiments show benifit of P2PDating algorithm

20 Future Work  Further Experiments:  real-world data from bookmark lists of active del.icio.us users  Firefox- or Proxy-Plugin for on-the-fly indexing, querying and display of results  Further Applications:  Adaption to MINERVA P2P Web Search

21 Thank you for your interest 14.05.2015 21 Tim Benke PLAGIA

22 FreePastry  Free open source version under BSD-license called FreePastry  FreePastry provides application level interface to underlying P2P-Network  API for Java 1.5  Version used: 2.0 Beta

23 Overview Basics of P2P-networks Querying in P2P-networks Overlap and Similarity Computation Process of P2P-dating Application examples: Firefox plugin del.icio.us

24 Chart 2  Comparision for 50 Iterations between - random peers asked - and p2pdating for x-1 Friends and 1 Stranger with weighted sum strategy, alpha=0.8 - only K-L-Divergence y-axis: recall x-axis: #Peers asked

25 Chart 1  Comparision for 75 Iterations between - 5 random peers - and p2pdating for 4 Friends and 1 Stranger with weighted sum strategy, alpha=0.8  - only K-L-Divergence y-axis: recall x-axis: iterations in steps of 5

26 O:P2P-Dating Project  Internet Crawls performed with APACHE- Project NUTCH provides collections  Collections are indexed by NUTCH and a LUCENE index is produced  1 similarity measure and 1 overlap measure used to determine if node is a Friend

27 Process of P2P-dating Michael Jordan Friend List


Download ppt "Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis."

Similar presentations


Ads by Google