Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis.

Slides:



Advertisements
Similar presentations
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
On the Usage of Global Document Occurrences (GDO) in P2P Information Systems or… Avoiding overlapping results in P2P searching Odysseas Papapetrou 1,2.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Small-world Overlay P2P Network
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
IT skills: IT concepts: Web client (browser), Web server, network connection, URL, mobile client, peer-to- peer application This work is licensed under.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Efficient Search in Large Textual Collections with Redundancy Jiangong Zhang and Torsten Suel Review by Newton Alex
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Learning Bit by Bit Search. Information Retrieval Census Memex Sea of Documents Find those related to “new media” Brute force.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
Searching in Unstructured Networks Joining Theory with P-P2P.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Nutch Search Engine Tool. Nutch overview A full-fledged web search engine Functionalities of Nutch  Internet and Intranet crawling  Parsing different.
Computer Concepts 2014 Chapter 7 The Web and .
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Lecturer: Ghadah Aldehim
A Web Crawler Design for Data Mining
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Patient Empowerment for Chronic Diseases System Sifat Islam Graduate Student, Center for Systems Integration, FAU, Copyright © 2011 Center.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Now, please open your book to page 60, and let’s talk about chapter 9: How Data is Stored.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Design a full-text search engine for a website based on Lucene
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
ICS362 – Distributed Systems Dr. Ken Cosh Week 2.
Peer to Peer Network Design Discovery and Routing algorithms
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
A P2P Distributed Adaptive Directory Gennaro Cordasco, Vittorio Scarano and Cristiano Vitolo ISIS-Lab – Dipartimento di Informatica ed Applicazioni ”R.M.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CMSC 691B Multi-Agent System A Scalable Architecture for Peer to Peer Agent by Naveen Srinivasan.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
General Architecture of Retrieval Systems 1Adrienn Skrop.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
Neighborhood - based Tag Prediction
An example of peer-to-peer application
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Improving Performance in the Gnutella Protocol
PageRank algorithm based on Eigenvectors
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis

Why P2P-Networks ?  Decentralisation  No single point of failure  No content-control  Distribution of content, computing power, bandwith

4 hops3 hops2 hops1 hop0 hops Querying in P2P-networks Dirk Nowitzki TTL:

Idea Semantic Overlay Networks Querying in unstructured P2P-networks message flooding with TimeToLive many redundant messages Group peers according to their content Querying in Semantic Overlay Network (SON) only ask all nodes for specific content field

Querying in a SON Dirk Nowitzki basketball flowers geology computer science

How to build a SON  Contact other peer P  If( isFriend(P) )  Add P in list of friends  Add P‘s friends in list of candidates  isFriend(P) judged by  How high is the similarity?  How small is the overlap?  How well did P cooperate? Dating

Process of P2P-dating  peer to send to chosen from 3 lists: friends, candidates, random  send check-alive message to friends  send contact message to candidates and random peers  receive synopses of collections and compute scores  friend and candidate lists have fixed lengths  Add until full then drop worst peers

Search in SON  peer P sends queries to peers with similar interest profile, i.e. all friends  Each peer only sends his top-k results back  When all answers have arrived P merges results, removes duplicates and delivers top-k results

Strategies for scores  Similarity Only:  Overlap Only:  Weighted Sum:  Random: no Score computed Similarity(A,B) 0 = the same >0 until ∞ : differs

Overlap Measure Minwise Independent Permutations measure the overlap with formula: = hashs of documents

Similarity Measure Kullback Leibler Divergence/ Relative Entropy Similarity(A,B) 0 = the same >0 until ∞ : differs

PASTRY: network infrastructure  Distributed Hash Table  maps keys to peers currently responsible for that key  MINERVA uses PASTRY  O( log(N) ) hops for any message to reach any destination

Local Collections  Index file saved on hard disk  LUCENE Index is an Inverted Index for terms occuring in websites obtained by  user – with surfing (e.g. by a plugin)  crawler on bookmarks  Allows additions and deletions

Experimental Setup  NUTCH was used as crawler  Seeds: start URL‘s on a certain topic from del.icio.us and dmoz.org  Depth: 2 each peer ~400 pages peer 1-4 Basketball peer 5-7 Computer Science peer 8-10 Flowers peer Geology  Queries for peer 1: „playoffs“, „Dirk Nowitzki“  Queries for peer 7: „thesis“  Queries for peer 12: „earth science“

Chart 1  Comparision for 75 Iterations between - 5 random peers - and p2pdating for 5 friends with weighted sum strategy, alpha=0.8  y-axis: recall x-axis: iterations in steps of 5

Chart 1

Chart 2  Comparision for 50 Iterations between - random peers asked - and p2pdating for x friends with weighted sum strategy, alpha=0.8  y-axis: recall x-axis: #peers asked

Chart 2

Conclusion  Use of PASTRY as underlying routing/networking infrastructure  Implementation of details of peer-to-peer network, p2pdating algorithm  Messages-handling several message types protocol for sending and receiving messages  Adaption of NUTCH to crawling  Use of LUCENE to query indexes  Experiments show benifit of P2PDating algorithm

Future Work  Further Experiments:  real-world data from bookmark lists of active del.icio.us users  Firefox- or Proxy-Plugin for on-the-fly indexing, querying and display of results  Further Applications:  Adaption to MINERVA P2P Web Search

Thank you for your interest Tim Benke PLAGIA

FreePastry  Free open source version under BSD-license called FreePastry  FreePastry provides application level interface to underlying P2P-Network  API for Java 1.5  Version used: 2.0 Beta

Overview Basics of P2P-networks Querying in P2P-networks Overlap and Similarity Computation Process of P2P-dating Application examples: Firefox plugin del.icio.us

Chart 2  Comparision for 50 Iterations between - random peers asked - and p2pdating for x-1 Friends and 1 Stranger with weighted sum strategy, alpha=0.8 - only K-L-Divergence y-axis: recall x-axis: #Peers asked

Chart 1  Comparision for 75 Iterations between - 5 random peers - and p2pdating for 4 Friends and 1 Stranger with weighted sum strategy, alpha=0.8  - only K-L-Divergence y-axis: recall x-axis: iterations in steps of 5

O:P2P-Dating Project  Internet Crawls performed with APACHE- Project NUTCH provides collections  Collections are indexed by NUTCH and a LUCENE index is produced  1 similarity measure and 1 overlap measure used to determine if node is a Friend

Process of P2P-dating Michael Jordan Friend List