Download presentation
Presentation is loading. Please wait.
Published byRoy Flowers Modified over 9 years ago
1
A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang
2
What's wrong with ? unlikely to index everything that‘s of interest (deep web) infeasible to run expensive algorithms on 8 billion documents difficult to input human knowledge
3
Peer-to-peer search Approach 0 Each peer has a local crawler and index Nobody posts any information about local indices Search can only be done by (limited) flooding No way to know where to find information in advance Very low recall for unpopular queries Matrix factorizatio n Relevant nerd
4
P2P Search Other methods have been proposed (see I. Weber 2004) What’s wrong? –Too complicated protocol to collaborate the peers –Too much data traffic and communication –Low speed
5
Hybrid—possible solution Combine Google and P2P together –Google indexes all the peer machine, but how?? –Each peer machine has an local index –When querying, Google selects the “appropriate” peers and sends the query. –Finally, Google merges all the results together.
6
Hybrid—possible solution Benefits: –Efficient compared to P2P –May overcome Google’s drawback Challenge: –Google’s PageRank is benefited from its large scale of indexed documents, how to adapt to the hybrid system –How does Google collaborate with peer machine? How can the peer machine benefit from Google’s PageRank? Funding this with $10M, do you agree?
7
References I. Weber et al (2004) Concept-based P2P Search http:// www.mpi- sb.mpg.de/~iweber/peer-to-peer/Concept- based%20P2P%20Search.ppthttp:// www.mpi- sb.mpg.de/~iweber/peer-to-peer/Concept- based%20P2P%20Search.ppt Inspired by the discussion with Shui- Lung Chuang
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.