Download presentation
Presentation is loading. Please wait.
Published byChrystal Pitts Modified over 8 years ago
1
Parallel and Distributed Searching
2
Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed Searching –Collection Partioning –Query Processing –Collection/Results Fusion
3
Boolean Queries Queries with terms connected by AND OR and NOT –(Internet AND retrieval) AND (NOT english) –“world wide web” OR internet
4
Advantages Easy to Implement Allow very precise query specifications Facilitate parallel execution
5
Disadvantages People are bad at Boolean algebra Difficult to interpret to get effective relevance ranking Difficult to include sensible query weighting
6
Parallel Searching Useful in improving performance in very large/heavily used search engines break query down into several subqueries execute each at the same time combine results share subqueries between different searches
7
Distributed Searching More about metasearching and turning plain searching into metasearching
8
Distribution Methods Multiple copies of collection: mirror sites Why not split the documents between servers according to their topics ?
9
Collection Partioning Manual/Semi automatic Topic Partioning –medical vs engineering –books vs CD’s One Central Index One Index per server
10
Distributed Query Processing Select collections to search distribute query to selected collections evaluate query at selected servers in parallel combine results into a final result
11
Source Selection Obtain global term distribution data –on the web ????? Analyse central index of collection relevance Missing gems
12
Missing Gems Example Query –wear characteristics of high titanium steel alloys –actually occurs in medical collection describing use in artificial hips
13
Results Fusion Want to present a single result collected from several sources Also known as collection fusion because it makes several collections appear as one
14
Results Fusion How do you put together the results from several web sites/search engines into a single combined result ? Collection at a time Round robin Relevance Ranked
15
Collection at a Time Use e.g. tf * idf across each collection to rank searched collection by relevance Display the results from the best collection first
16
Tf *idf Tf - term frequency –terms that are frequently mentioned in individual documents improve recall idf - inverse document frequency –inversely proportional to the number of documents which mention a term –prefers discriminating terms
17
Round Robin Take the first document from collection 1 Then the first document from collection 2 and so on for each collection then the second document from collection 1 and so on
18
Relevance based methods Calculate Relevance for the documents returned by each selected source Try to calculate some global statistics Use some special measures
19
Other Alternatives Random Firstcome first show etc ….
20
Conclusions Parallel Searching is one way to speed up searching Distributing Information can help ease/speed searching and but has some dangers Some solutions to the results fusion problem
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.