Download presentation
Presentation is loading. Please wait.
Published byEsther Patrick Modified over 9 years ago
1
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze University, Taiwan, ROC2006/5/22 Models of Trust for the Web (MTW'06)
2
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 2 Outline Introduction Related Work A 2-D Bias Assessment Results and Discussion Conclusions
3
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 3 Introduction Web search engines have become a significant gateway to the Internet. People may get used to a few particular search engines. Users may thus be affected by biased search results unknowingly.
4
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 4 Search Engine Bias Search engine bias is incurred from: –diverse operating policies and business strategies, e.g. “Falun Gong” event in China, –some limitations of crawling, indexing, and ranking techniques, e.g. the Googlewashed event, –opposed political standpoints, diverse cultural backgrounds, and different social custom.
5
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 5 A Query Example
6
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 6 Our Research Establish a new mechanism in assessing bias of Web search engines. Provide a two-dimensional scheme by adopting both indexical bias and content bias to assess search engine bias.
7
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 7 Indexical Bias vs. Content Bias The bias of a search engine represents the “deviation of the norm from the result of a search engine”. The differences in the sets of URLs retrieved by most Web search engines are termed indexical bias. The deviations of contents provided by a search engine from the contents provided by most Web search engines are termed content bias.
8
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 8 Related Work The assessment of indexical bias ( proposed by Mowshowitz and Kawaguchi ) 1. Select a pool of search engines as the norm. 2. Transform the URLs into vectors. 3. Calculate the similarity of URLs between the search engine to be compared and the norm. 4. Subtract the similarity value from 1 to gain the bias value.
9
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 9 An Example –First 10 URLs were retrieved from 3 search engines by using 2 queries. (10 x 3 x 2 = 60) –48 distinct URLs represent 44 Web sites. –The norm = (7,3,2,2,2,2,…….,1,1,1,1,1) –Google = (3,1,1,0,2,1,…….,0,1,1,0,0) –48/(124 x 28) 1/2 = 0.8146 –Bias value = 1 - 0.8146 = 0.1854
10
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 10 Our Considerations The method proposed by Mowshowitz and Kawaguchi tells us the deviations of Web sites not really their contents. If we examine the bias from both the indexical view and content view, we may get the panorama of search engine bias.
11
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 11 Selection of the Norm An explicit norm is mainly from careful examinations of subject experts. Manual examination is impractical in a extremely large and fast-changing Web environment. An implicit norm is defined by choosing a collection of search results from several representative search engines.
12
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 12 The Selection Criteria The search engines –are generally designed for different subject areas. –are comparable to each other. –have their own processing rules.
13
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 13 The Process of Bias Assessment
14
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 14 The Assessment Algorithm (I) Scores are calculated as follows: –Score = f (d + tW t + HW H +hW h )*log(n/d) f: term frequency d: document frequency t: title H: H1 h: H2 n: total document number
15
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 15 The Assessment Algorithm (II) X = (x 1, x 2, x 3, …,x n ) N = (n 1, n 2, n 3, …,n n ) Bias = 1 – cos ( X, N )
16
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 16 Experimental Environment 10 popular Web search engines are chosen. –About, AltaVista, Excite, Google, Inktomi, Lycos, MSN, Overture, Teoma, and Yahoo. The top 10 URLs are retrieved for further calculation.(Silverstein et al. showed that 85% queries are from the first result screen.)
17
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 17 The Averaged Indexical Bias
18
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 18 The Averaged Content Bias
19
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 19 2-D Analysis for Hot Queries
20
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 20 ANOVA Results (I) The averaged bias result –Indexical Bias –Content Bias
21
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 21 ANOVA Results (II) Between each search engine over the ten hot query terms –Indexical Bias –Content Bias
22
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 22 The Case of “Second Superpower”
23
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 23 Conclusions The bias of Web search engines has a deep effect upon Internet users. The assessment of Indexical bias by only considering URLs may not display the panorama of search engine bias. We provide users with a more comprehensive reference to notice the blind spot of one- dimensional bias assessment. Statistical analyses further present that a two- dimensional scheme can fulfill the task of bias assessment.
24
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment 24 Thank You! If you have any question, please email sean@syslab.cse.yzu.edu.tw.sean@syslab.cse.yzu.edu.tw
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.