Download presentation
Presentation is loading. Please wait.
Published byTobias Brown Modified over 9 years ago
1
CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud Rahgozar School of Electrical and Computer Engineering University of Tehran Farhad Oroumchian University of Wollongong in Dubai
2
2 University of Tehran - Database Research Group Outline The Persian Language Used Methods Vector Space Model Language Modeling OWA Operator The test collections Experiment results Conclusion
3
3 University of Tehran - Database Research Group Outline The Persian Language Used Methods Vector Space Model Language Modeling OWA Operator The test collections Experiment results Conclusion
4
4 University of Tehran - Database Research Group The Persian Language It is Spoken in countries like Iran, Tajikistan and Afghanistan It has Arabic like script for writing and consists of 32 characters that are written continuously from right to left It’s morphological analyzers need to deal with many forms of words that are not actually Farsi Example The word “کافر” (singular) “کفار” (plural) Or “عادت” that has two plural forms in Farsi: –Farsi form“عادت ها” –Arabic form“عادات”
5
5 University of Tehran - Database Research Group Outline The Persian Language Used Methods Vector Space Model Language Modeling OWA Operator The test collections Experimental results Conclusion
6
6 University of Tehran - Database Research Group NameWeighting tf.idf tf*log(N/n) / ( (tf 2 ) * (qtf 2 )) lnc.ltc (1+log(tf))*(1+log(qtf))*log((1+N)/n) / ( (tf 2 ) * (qtf 2 )) nxx.bpx (0.5+0.5*tf/max tf)+log((N-n)/n) tfc.nfc tf*log(N/n)*(0.5+0.5*qtf/max qtf)*log(N/n) / ( (tf 2 ) * (qtf 2 )) tfc.nfx1 tf* log(N/n)*(0.5+0.5*qtf/max qtf) *log(N/n) / ( (tf * log(N/n)) 2 ) tfc.nfx2 tf*log(N/n)*(0.5+0.5*qtf/max qtf)*log(N/n) / ( (tf 2 )) Lnu.ltu ((1+log(tf))*(1+log(qtf))*log((1+N)/n))/ ((1+log(average tf)) * ((1-s) + s * N.U.W/ average N.U.W) 2) List of Weights that produced the best results Vector Space Model Best We used Lnu.ltu and Lnc.btc weighting schemas
7
7 University of Tehran - Database Research Group Outline The Persian Language Used Methods Vector Space Model Language Modeling OWA Operator The test collections Experimental results Conclusion
8
8 University of Tehran - Database Research Group Language Modeling Hiemstra (2001) proposed four ways to specify the rank of document d against query q Considering P(D=d) as the prior probability of relevance of the document d to the query q with query terms t1,...,tn. Lambda (λ ) is a smoothing parameter and is equal for each query term. Hiemstra (2002) emphasizes if there is no previous relevance information available for a query, each query term will be considered equally important.
9
9 University of Tehran - Database Research Group Language Modeling- Cont.
10
10 University of Tehran - Database Research Group Outline The Persian Language Used Methods Vector Space Model Language Modeling OWA Operator The test collections Experimental results Conclusion
11
11 University of Tehran - Database Research Group OWA Operator- Cont. We used OWA operator as the merge operator. The OWA operator with n dimensions is a nonlinear aggregation operator OWA: [0, 1]n [0, 1] with a weighting vector W=[w1,w2,…,wn] such that Sigma(Wi)=1 with wi in [0, 1]. The OWA weight of each document d is defined as: where xi indicated the score of document d in the ith list. Each score xi is assigned by Ri (ith search engine) to document d. If d is not present in the ith list then xi=0..
12
12 University of Tehran - Database Research Group OWA Operator- Cont. The OWA weight of each document is computed by this Equation: in which WT is the transpose vector of W that defines the semantics of associated with the OWA operator and B=[b1,b2,..,bn] is the vector X=[ x1, x2,…, xn] reordered so that bj=Minj(x1, x2,…, xn), that is the jth smallest element of all the x1, x2,…, xn. we used a simple function to bring the scores ({xi, i=1,…,n}) into a same scale
13
13 University of Tehran - Database Research Group OWA Operator- Weighting Method Quantifier Based Weighting Degree of Importance Based Weighting
14
14 University of Tehran - Database Research Group Quantifier Based Weighting linguistic quantifiers All, Mostn, Fewn, and At-Least-One as the weighting schemas, All: consider documents appearing in all retrieval engines’ lists. This quantifier is suitable when the user is looking for precise answer Most: a fuzzy majority operator that assumes the retrieval by the most of the engines to be sufficient for inclusion in the fused list. Few: is a weaker weighting schemas in which it is enough for a document to be retrieved by a few number of retrieval engines. At-Least-One: is the weakest weighting schemas in which it is enough for a document to appear in only one retrieval engine’s list to be included in the fused list. Hence the All quantifier has an AND semantic and the At-Least-One quantifier has an OR semantic. The Most and Few quantifiers have the semantics in between an AND and an OR operators.
15
15 University of Tehran - Database Research Group Degree of Importance Based Weighting As the second weighting schema we use the position of the documents in the retrieved lists to produce the weighting vector W= [w1, w2,…, wn]. The weight of each document d in the Li,q is defined by in which Ni is the number of elements in the ith list, Li,q, and POSi is the position of document d in Li,q..
16
16 University of Tehran - Database Research Group Outline The Persian Language Used Methods Vector Space Model Language Modeling OWA Operator The test collections Experimental results Conclusion
17
17 University of Tehran - Database Research Group Test Collections Qvanin Collection Documents: Iranian Law Collection 177089 passages 41 queries and Relevance Judgments Hamshari Collection Documents: 600+ MB News from Hamshari Newspaper 160000+ news articles 60 queries and Relevance Judgments BijanKhan Tagged Collection Documents: 100+ MB from different sources A tag set of 41 tags 2590000+ tagged words
18
18 University of Tehran - Database Research Group Hamshahri Collection We used HAMSHAHRI (a test collection for Persian text prepared and distributed by DBRG (IR team) of University of Tehran) The 3 rd version: –contains about 160000+ distinct textual news articles in Farsi –60 queries and relevance judgments for top 20 relevant documents for each query
19
19 University of Tehran - Database Research Group Outline The Persian Language Used Methods Pivoted normalization N-Gram approach Local Context Analysis Our test collections Experimental results Conclusion
20
20 University of Tehran - Database Research Group Experiment results
21
21 University of Tehran - Database Research Group Quantifier Based OWA Weighting MethodWeighting VectorOrness Degree AllW=[1,0,0,0,0,0] 0.00 Most 2 W=[0,.5,.5,0,0,0] 0.30 Most 3 W=[0,.33,.33,.33,0,0] 0.40 Most 4 W=[0,.25,.25,.25,.25,0] 0.50 Few 3 W=[0,0,.33,.33,.33,0] 0.59 Few 2 W=[0,0,0,.5,.5,0] 0.70 At-Least-OneW=[0,0,0,0,0,1] 1.00
22
22 University of Tehran - Database Research Group Experiment results
23
23 University of Tehran - Database Research Group Experiment results
24
24 University of Tehran - Database Research Group Statistical significance tests Wilcoxon Signed Rank T-Test MethodMost 3 Most 4 LM4 0.8610.894--- Lnu.ltu 0.25 0.0810.0770.444 MethodMost 3 Most 4 LM4 0.4560.383-- Lnu.ltu 0.250.0280.0270.147
25
25 University of Tehran - Database Research Group Statistical significance tests Based on T Test, both Mos3 and Most4 methods are significantly better than LM4 method which is a confirmation of The Wilconxon Signed Rank test. However, with the T-Test we can not confirm the significance of the Mos3 and Most4 methods over the Lnu.ltu with slope of 0.25 method.
26
26 University of Tehran - Database Research Group Conclusion We used two weighting namely quantifier based and degree-of-importance based weighting methods The experimental results show that the best OWA operator, Most3 and Most4 (quantifier based OWA operators), only marginally improve over the best retrieval method on Persian text the LM4 methods. However seems they produce better ranking since they push the relevant documents to higher ranks. The significant tests we conducted seem to confirm that Most3 and Most4 are significantly better than all other methods but Lnu.ltu with slope of 0.25. However, the superiority over the Lnu.ltu with slope of 0.25 was not confirmed by T-Test.
27
27 University of Tehran - Database Research Group Thanks, Questions ? http://ece.ut.ac.ir/dbrg
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.