Download presentation
Presentation is loading. Please wait.
Published byLillian Knight Modified over 9 years ago
1
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006
2
2 Outline Introduction Introduction A model for topic difficulty A model for topic difficulty Validating the model Validating the model Uses of the model Uses of the model Conclusion Conclusion
3
3 Introduction Typical TREC topics for comparison between systems are defined by: Typical TREC topics for comparison between systems are defined by: –A textual description –A set of documents relevant to the information need Experimental results of TREC participants show a wide diversity in effectiveness among topics as well as among systems Experimental results of TREC participants show a wide diversity in effectiveness among topics as well as among systems Robust track
4
4 Introduction The goals of TREC Robust track The goals of TREC Robust track a.Encouraging systems to decrease variance by focusing on poorly performing topics b.To estimate the relative difficulty of each topic c.To study whether an old and difficult topic is still difficult for current state-of-the-art IR systems d.To study whether topics difficult in one collection are still difficult in another collection Why are some topics more difficult than others? Why are some topics more difficult than others?
5
5 Related Work Clarity measure for queries Clarity measure for queries Linguistic features of query Linguistic features of query Number of topic aspects Number of topic aspects Features of entire collection Features of entire collection Reliable Information Access (RIA) workshop Reliable Information Access (RIA) workshop –Ten failure categories are identified –Most failures are related to identify all aspects of the topic
6
6 Outline Introduction Introduction A model for topic difficulty A model for topic difficulty Validating the model Validating the model Uses of the model Uses of the model Conclusion Conclusion
7
7 Topic Difficulty Model Components of a topic: the queries Q and the relevant documents R, dependent on the collection C: Components of a topic: the queries Q and the relevant documents R, dependent on the collection C: 1.d(Q, C) – the distance between the queries, Q, and the collection, C 2.d(Q, Q) – the distance among the queries 3.d(R, C) – the distance between the relevant documents, R, and the collection C 4.d(R, R) – the distance among the relevant documents 5.d(Q, R) – the distance between the queries, Q, and the relevant documents, R. Figure 1: a general model for topic difficulty
8
8 Distance Measure Jensen-Shannon divergence (JSD) Jensen-Shannon divergence (JSD) –A symmetric version of the Kullback-Leibler divergence (KLD) –Applied to d(Q, C), d(R, C) and d(Q, R) –For the distributions P( w ) and Q( w ) over the words in the collection, the JSD is:
9
9 Distribution of Terms The probability distribution of a word w within the document or query x : The probability distribution of a word w within the document or query x : λ=0.9 for d(Q, Q), d(Q, R) and d(R, R) λ=0.99 for d(Q, C) and d(R, C)
10
10 Topic Aspects and Topic Broadness The aspect coverage problem is to find documents that cover as many different aspects as possible The aspect coverage problem is to find documents that cover as many different aspects as possible –Providing more information to the user In the model, topic broadness (difficulty) is measured by the distance d(R, R) In the model, topic broadness (difficulty) is measured by the distance d(R, R) JSD suffers from the drawback that identical documents are very close together JSD suffers from the drawback that identical documents are very close together Using topic aspects to measure d(R, R) Using topic aspects to measure d(R, R) –Number of clusters of the relevant documents Square root of JSD for distance measure between documents Square root of JSD for distance measure between documents
11
11 Document Coverage and Query Coverage Rarely does the information pertaining to both facets of the model exist Rarely does the information pertaining to both facets of the model exist When only Q or R are available, the missing part is approximated by JSD When only Q or R are available, the missing part is approximated by JSD –Document coverage (DC) –Query coverage (QC)
12
12 Practical Considerations for Document Coverage Computing document coverage for a given query is NP-hard approximation Computing document coverage for a given query is NP-hard approximation –Only the top 100 documents retrieved for the query are considered –A greedy algorithm The document closest to the query is found The document closest to the query is found Iteratively adding the document that causes the largest decrease in JSD between the query and the selected docs Iteratively adding the document that causes the largest decrease in JSD between the query and the selected docs Once a minimum is reached, the value of JSD is measured and the set of accumulated documents is used as an approximation to the true DC set Once a minimum is reached, the value of JSD is measured and the set of accumulated documents is used as an approximation to the true DC set Figure 2: A typical JSD curve obtained by the greedy algorithm for document coverage detection
13
13 Practical Considerations for Query Coverage Query coverage for given relevant documents Query coverage for given relevant documents –Only the set of terms belong to R are considered by the greedy algorithm –The iterative process results in a list of ranked words The most representative words The most representative words
14
14 Outline Introduction Introduction A model for topic difficulty A model for topic difficulty Validating the model Validating the model Uses of the model Uses of the model Conclusion Conclusion
15
15 Experiment Environment Search engine: Juru Search engine: Juru Topics: the 100 topics of TREC 2004 and 2005 terabyte tracks Topics: the 100 topics of TREC 2004 and 2005 terabyte tracks Document collection:.GOV2 (25 million docs) Document collection:.GOV2 (25 million docs)
16
16 Model-Induced Distances vs. Average Precision Table 1: Comparison of Pearson and Spearman correlation coefficients between the different distances induced by the topic difficulty model and the AP of the 100 topics Distance Juru ’ s AP TREC median AP Pearson Spearman ’ s ρ Pearson d(Q, C) d(R, C) d(Q, R) d(R, R) 0.1670.322-0.065+0.1500.1700.290-0.1340.1410.2980.331-0.0190.1190.2920.3230.0040.155 Combined0.4470.476
17
17 Model-Induced Distances vs. Topic Aspect Coverage Topic aspect coverage Topic aspect coverage –Average precision of the top-ranked document for each aspect Distance Juru ’ s AP Pearson Spearman ’ s ρ d(Q, C) d(R, C) d(Q, R) d(R, R) 0.0470.143-0.271-0.3640.0470.194-0.285-0.418 Combined0.482 Table 2: Correlations between the different distances and the aspect coverage
18
18 Outline Introduction Introduction A model for topic difficulty A model for topic difficulty Validating the model Validating the model Uses of the model Uses of the model Conclusion Conclusion
19
19 Uses of the Model Estimating query average precision Estimating query average precision Estimating topic aspect coverage Estimating topic aspect coverage Estimating topic findability Estimating topic findability –The likelihood of documents in the domain (topic) returning as answers to queries related to the domain
20
20 Estimating Average Precision R ’ is an approximation of the set of relevant documents, by approximating document coverage R ’ is an approximation of the set of relevant documents, by approximating document coverage Using d(Q, C), d(Q, R ’ ) and d(R ’, C) as features for Support-Vector Machine (SVM) Using d(Q, C), d(Q, R ’ ) and d(R ’, C) as features for Support-Vector Machine (SVM) –Leave-one-out cross-validation The Pearson correlation between the actual average precision to the predicted average precision is 0.362 The Pearson correlation between the actual average precision to the predicted average precision is 0.362
21
21 Estimating Aspect Coverage The same approach as estimating average precision The same approach as estimating average precision The Pearson correlation between the actual aspect coverage and the predicted one is 0.397 The Pearson correlation between the actual aspect coverage and the predicted one is 0.397 The same features are also used to train an estimator to detect low coverage (<10%) queries The same features are also used to train an estimator to detect low coverage (<10%) queries Figure 3: Receiver operating characteristic (ROC) curve for distinguishing queries with low aspect coverage from other queries (the area under the curve is 0.88) P(decide query with low coverage | query with high coverage) P(decide query with low coverage | query with low coverage)
22
22 Estimating Topic Findability Given a set of documents of a domain, findability represents how easy it is for a user to find these documents Given a set of documents of a domain, findability represents how easy it is for a user to find these documents –Related to the field of search engine optimization For each topic, the 10 best words are selected from the result of query coverage approximation For each topic, the 10 best words are selected from the result of query coverage approximation –A sequence of queries: the best one word, the best two words and so on –For each topic, the resulting values of AP against the number of terms are its features for K-means clustering
23
23 Results of Estimating Topic Findability Figure 4: Cluster centers of the AP curves versus the number of best words. The curves represent three typical findability behaviors
24
24 Outline Introduction Introduction A model for topic difficulty A model for topic difficulty Validating the model Validating the model Uses of the model Uses of the model Conclusion Conclusion
25
25 Conclusion The novel topic difficulty model is addressed to capture the main components of a topic and the relations between those components to topic difficulty. The novel topic difficulty model is addressed to capture the main components of a topic and the relations between those components to topic difficulty. The larger the distance of the queries and the Qrels from the entire collection, the better the topic can be answered. The larger the distance of the queries and the Qrels from the entire collection, the better the topic can be answered. The applicability of the difficulty model is demonstrated. The applicability of the difficulty model is demonstrated. There are more important features affecting topic difficulty left for further research, ex: ambiguity of the query terms, or topics with missing content. There are more important features affecting topic difficulty left for further research, ex: ambiguity of the query terms, or topics with missing content.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.