Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng) Zhai Department.

Slides:

Advertisements

Similar presentations

Lower Bounds for Local Search by Quantum Arguments Scott Aaronson.

Advertisements

1 Hyades Command Routing Message flow and data translation.

1 Approximability Results for Induced Matchings in Graphs David Manlove University of Glasgow Joint work with Billy Duckworth Michele Zito Macquarie University.

Chapter 7 Sampling and Sampling Distributions

1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.

Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.

Alexander Kotov and ChengXiang Zhai University of Illinois at Urbana-Champaign.

The Weighted Proportional Resource Allocation Milan Vojnović Microsoft Research Joint work with Thành Nguyen Microsoft Research Asia, Beijing, April, 2011.

Business and Economics 6th Edition

LIAL HORNSBY SCHNEIDER

ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

Bellwork Do the following problem on a ½ sheet of paper and turn in.

Computer vision: models, learning and inference

Text Categorization.

Chapter 5: Query Operations Hassan Bashiri April

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.

Joint work with Andre Lieutier Dassault Systemes Domain Theory and Differential Calculus Abbas Edalat Imperial College Oxford.

Traditional IR models Jian-Yun Nie.

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Boolean and Vector Space Retrieval Models

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.

SIGIR’2005 Gravitation-Based Model for Information Retrieval Shuming Shi Ji-Rong Wen Qing Yu Ruihua Song Wei-Ying Ma Microsoft Research Asia

An Adaptive System for User Information needs based on the observed meta- Knowledge AKERELE Olubunmi Doctorate student, University of Ibadan, Ibadan, Nigeria;

DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.

CSE3201/4500 Information Retrieval Systems

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.

1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, and ChengXiang Zhai University of Illinois at Urbana Champaign SIGIR 2004 (Best paper.

Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Language Models Hongning Wang

Probabilistic Ranking Principle

Information Retrieval Models: Probabilistic Models

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

1 Computing Relevance, Similarity: The Vector Space Model.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Gravitation-Based Model for Information Retrieval Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song, Wei-Ying Ma Microsoft Research Asia SIGIR 2005.

Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.

1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Information Retrieval Models: Vector Space Models

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (“Cheng”) Zhai Department of Computer Science University of Illinois at.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

A Study of Poisson Query Generation Model for Information Retrieval

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, ChengXiang Zhai University of Illinois at Urbana Champaign Urbana SIGIR 2004 Presented.

A Formal Study of Information Retrieval Heuristics

Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1

Introduction to IR Research

Information Retrieval Models: Probabilistic Models

Language Models for Information Retrieval

John Lafferty, Chengxiang Zhai School of Computer Science

Language Model Approach to IR

Language Models for TR Rong Jin

Presentation transcript:

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng) Zhai Department of Computer Science University of Illinois at Urbana-Champaign 1

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Research on retrieval models has a long history Vector Space Models: [Salton et al. 1975], [Singhal et al. 1996], … Classic Probabilistic Models: [Maron & Kuhn 1960], [Harter 1975], [Robertson & Sparck Jones 1976], [van Rijsbergen 1977], [Robertson 1977], [Robertson et al. 1981], [Robertson & Walker 1994], … Language Models: [Ponte & Croft 1998], [Hiemstra & Kraaij 1998], [Zhai & Lafferty 2001], [Lavrenko & Croft 2001], [Kurland & Lee 2004], … Non-Classic Logic Models : [Rijsbergen 1986], [Wong & Yao 1991], … Divergence from Randomness: [Amati & Rijsbergen 2002], [He & Ounis 2005], … Learning to Rank: [Fuhr 1989], [Gey 1994],... … 2 Many different models were proposed and tested

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Some are working very well (equally well) Pivoted length normalization (PIV) [Singhal et al. 96] BM25 [Robertson & Walker 94] PL2 [Amati & Rijsbergen 02] Query likelihood with Dirichlet prior (DIR) [Ponte & Croft 98], [Zhai & Lafferty] … 3 but many others failed to work well…

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Questions Why do {BM25, PIV, PL, DIR, …} tend to perform similarly even though they were derived in very different ways? Why are they better than many other variants? Why does it seem to be hard to beat these strong baseline methods? Are they hitting the ceiling of bag-of-words assumption? –If yes, how can we prove it? –If not, how can we find a more effective one? 4

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Suggested Answers Why do {BM25, PIV, PL, DIR, …} tend to perform similarly even though they were derived in very different ways? Why are they better than many other variants? Why does it seem to be hard to beat these strong baseline methods? We dont have a good knowledge about their deficiencies Are they hitting the ceiling of bag-of-words assumption? –If yes, how can we prove it? –If not, how can we find a more effective one? 5 They share some nice common properties These properties are more important than how each is derived Other variants dont have all the nice properties Need to formally define the ceiling (= complete set of nice properties)

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Main Point of the Talk: Axiomatic Relevance Hypothesis (ARH) Relevance can be modeled by a set of formally defined constraints on a retrieval function –If a function satisfies all the constraints, it will perform well empirically –If function Fa satisfies more constraints than function Fb, Fa would perform bettter than Fb empirically Analytical evaluation of retrieval functions –Given a set of relevance constraints C={c1, …, ck} –Function Fa is analytically more effective than function Fb iff the set of constraints satisfied by Fb is a proper subset of those satisfied by Fa –A function F is optimal iff it satisfies all the constraints in C 6

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Rest of the Talk 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 7

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Outline 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 8

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Pivoted Normalization Method Dirichlet Prior Method Okapi Method Inversed Document Frequency Document Length Normalization Term Frequency Motivation: different models, but similar heuristics Parameter sensitivity PL2 is a bit more complicated, but implements similar heuristics 9

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Are they performing well because they implement similar retrieval heuristics? Can we formally capture these necessary retrieval heuristics? 10

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy d2:d2: d1:d1: Term Frequency Constraints (TFC1) TFC1 TF weighting heuristic I: Give a higher score to a document with more occurrences of a query term. q : w If and Let q be a query with only one term w. then 11

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Term Frequency Constraints (TFC2) TF weighting heuristic II: Favor a document with more distinct query terms. d1:d1: d2:d2: then If and Let q be a query and w 1, w 2 be two query terms. Assume TFC2 q: w1w1 w2w2 12

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Length Normalization Constraints(LNCs) Document length normalization heuristic: Penalize long documents(LNC1); avoid over-penalizing long documents (LNC2). LNC2 d2:d2: q: Let q be a query. d1:d1: Ifand then d1:d1: d2:d2: q: Let q be a query. If for some word but for other words then LNC1 13

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy 14 TF-LENGTH Constraint (TF-LNC) TF-LNC TF-LN heuristic: Regularize the interaction of TF and document length. q: w d2:d2: d1:d1: Let q be a query with only one term w. then and If

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Seven Basic Relevance Constraints [Fang et al. 2011] Hui Fang, Tao Tao, ChengXiang Zhai: Diagnostic Evaluation of Information Retrieval Models. ACM Trans. Inf. Syst. 29(2): 7 (2011) 15

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Outline 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 16

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Axiomatic Relevance Hypothesis (ARH) Relevance can be modeled by a set of formally defined constraints on a retrieval function –If a function satisfies all the constraints, it will perform well empirically –If function Fa satisfies more constraints than function Fb, Fa would perform bettter than Fb empirically Analytical evaluation of retrieval functions –Given a set of relevance constraints C={c1, …, ck} –Function Fa is analytically more effective than function Fb iff the set of constraints satisfied by Fb is a proper subset of those satisfied by Fa –A function F is optimal iff it satisfies all the constraints in C 17

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Testing the Axiomatic Relevance Hypothesis Is the satisfaction of these constraints correlated with good empirical performance of a retrieval function? Can we use these constraints to analytically compare retrieval functions without experimentation? Yes! to both questions –Constraint analysis reveals optimal ranges of parameter values –When a formula does not satisfy the constraint, it often indicates non-optimality of the formula. –Violation of constraints may pinpoint where a formula needs to be improved. 18

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Parameter sensitivity of s s Avg. Prec. Bounding Parameters Pivoted Normalization Method LNC2 s< Optimal s (for average precision) 19

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Negative when df(w) is large Violate many constraints Analytical Comparison Okapi Method Pivoted Okapi keyword queryverbose query s or b Avg. Prec 20

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Fixing a deficiency in BM25 improves the effectiveness Make Okapi satisfy more constraints; expected to help verbose queries Modified Okapi Method keyword query verbose query s or b Avg. Prec. Pivoted Okapi Modified Okapi 21

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Systematic Analysis of 4 State of the Art Models [Fang et al. 11] 22 Parameter s must be small Problematic when a query term occurs less frequently in a doc than expected Negative IDF Problematic with common terms; parameter c must be large Question: why are Dirichlet and PL2 still competitive despite their inherent problems that cant be fixed through parameter tuning?

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Outline 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 23

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy How can we leverage constraints to find an optimal retrieval model? 24

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy C2C2 C3C3 S1S1 S2S2 S3S3 Function space C1C1 Retrieval constraints Our target Function space S1S1S1S1 S2S2S2S2 S3S3S3S3 25 Basic Idea of the Axiomatic Framework (Optimization Problem Setup)

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Three Questions How do we define the constraints? How do we define the function space? How do we search in the function space? 26 One possibility: leverage existing state of the art functions Weve talked about that; more later One possibility: search in the neighborhood of existing state of the art functions

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Inductive Definition of Function Space Define the function space inductively Q: D: cat dog Primitive weighting function (f) S(Q,D) = S(, ) = f (, ) big Query growth function (h) S(Q,D) = S(, ) = S(, )+h(,, ) Document growth function (g) S(Q,D) = S(, ) = S(, )+g(,, ) big 27

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy C1C1C1C1 C3C3C3C3 C2C2C2C2 Derivation of New Retrieval Functionsdecompose S S generalize constrain existing function assemble new function 28

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy A Sample Derived Function based on BM25 [Fang & Zhai 05]IDFTF length normalization QTF 29

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy The derived function is less sensitive to the parameter setting Axiomatic Model better 30

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Inevitability of heuristic thinking and necessity of axiomatic analysis The theory-effectiveness gap –Theoretically motivated models dont automatically perform well empirically –Heuristic adjustment seems always necessary –Cause: inaccurate modeling of relevance How can we bridge the gap? –The answer lies in axiomatic analysis –Use constraints to help identify the error in modeling relevance, thus obtaining insights about how to improve a model 31

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Systematic Analysis of 4 State of the Art Models [Fang et al. 11] 32 Parameter s must be small Problematic when a query term occurs less frequently in a doc than expected Negative IDF Problematic with common terms; parameter c must be large Modified BM25 satisfies all the constraints! Without knowing its deficiency, we cant easily propose a new model working better than BM25

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy A Recent Success of Axiomatic Analysis: Lower Bounding TF Normalization [Lv & Zhai 11] Existing retrieval functions lack a lower bound for normalized TF with document length –Long documents overly penalized –A very long document matching two query terms can have a lower score than a short document matching only one query term Proposed two constraints for lower bounding TF Proposed a general solution to fix the problem that worked for BM25, PL2, Dirichlet, and Piv, leading to improved versions of them (BM25+, PL2+, Dir+, Piv+) 33

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy New Constraints: LB1 & LB2 34 LB1: Let Q be a query. Assume D1 and D2 are two documents such that S(Q,D1) = S(Q,D2). If we reformulate the query by adding another term q Q into Q, where c(q,D1) = 0 and c(q,D2) > 0, then S(Q {q},D1) < S(Q {q},D2). LB2: Let Q = {q1, q2} be a query with two terms q1 and q2. Assume td(q1) = td(q2), where td(t) can be any reasonable measure of term discrimination value. If D1 and D2 are two documents such that c(q2,D1) = c(q2,D2) = 0, c(q1,D1) > 0, c(q1,D2) > 0, and S(Q,D1) = S(Q,D2), then S(Q,D1 {q1} {t1}) < S(Q,D2 {q2} {t2}), for all t1 and t2 such that t1 D1, t2 D2, t1 Q and t2 Q. Repeated occurrence of an already matched query term isnt as important as the first occurrence of an otherwise absent query term The presence –absence gap (0-1 gap) shouldnt be closed due to length normalization

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy It turns out none of BM25, PL2, Dirichlet, PIV satisfies both constraints A general heuristic solution: add a small constant lower bound Worked well for improving all the four models 35

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy BM25+ Improves over BM25 36 For details, see Yuanhua Lv, ChengXiang Zhai, Lower Bounding Term Frequency Normalization, Proceedings of the 20 th ACM International Conference on Information and Knowledge Management (CIKM'11), to appear.

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy More examples of theory-effectiveness gap and the need for axiomatic analysis The derivation of the query likelihood retrieval function relies on 3 assumptions: (1) query likelihood scoring; (2) independency of query terms; (3) collection LM for smoothing; however, it cant explain why some apparently reasonable smoothing methods perform poorly In statistical translation model for retrieval [Berger & Lafferty 99], we must ensure sufficient self translation probability to avoid unreasonable retrieval results, but such a constraint cant be explained by estimation of translation model No explanation why other divergence-based similarityy function doesnt work well as the asymmetric KL-divergence function D(Q||D) … 37

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Outline 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 38

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Open Challenges Does there exist a complete set of constraints? –If yes, how can we define them? –If no, how can we prove it? How do we evaluate the constraints? –How do we evaluate a constraint? (e.g., should the score contribution of a term be bounded? In BM25, it is.) –How do we evaluate a set of constraints? How do we define the function space? –Search in the neighborhood of an existing function? –Search in a new function space? Quantum IR? 39

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Open Challenges How do we check a function w.r.t. a constraint? –How can we quantify the degree of satisfaction? –How can we put constraints in a machine learning framework? Something like maximum entropy? How can we go beyond bag of words? Model pseudo feedback? Cross-lingual IR? Conditional constraints on specific type of queries? Specific type of documents? 40

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Possible Future Scenario 1: Impossibility Theorems for IR We will find inconsistency among constraints Will be able to prove impossibility theorems for IR –Similar to Kleinbergs impossibility theorem for clustering 41 J. Kleinberg. An Impossibility Theorem for Clustering. Advances in Neural Information Processing Systems (NIPS) 15, 2002

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Future Scenario 2: Sufficiently Restrictive Constraints We will be able to propose a comprehensive set of constraints that are sufficient for deriving a unique (optimal) retrieval function –Similar to the derivation of the entropy function 42

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Future Scenario 3 (most likely): Open Set of Insufficient Constraints We will have a large set of constraints without conflict, but insufficient for ensuring good retrieval performance Room for new constraints, but well never be sure what they are We need to combine axiomatic analysis with a constructive retrieval functional space and supervised machine learning 43

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Summary: Axiomatic Relevance Hypothesis Formal retrieval function constraints for modeling relevance Axiomatic analysis as a way to assess optimality of retrieval models Inevitability of heuristic thinking in developing retrieval models for bridging the theory-effectiveness gap Possibility of leveraging axiomatic analysis to improve the state of the art models Axiomatic Framework = constraints + constructive function space based on existing or new models and theories 44

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Updated Answers Why do {BM25, PIV, PL, DIR, …} tend to perform similarly even though they were derived in very different ways? Why are they better than many other variants? Why does it seem to be hard to beat these strong baseline methods? We dont have a good knowledge about their deficiencies Are they hitting the ceiling of bag-of-words assumption? –If yes, how can we prove it? –If not, how can we find a more effective one? 45 They share some nice common properties These properties are more important than how each is derived Other variants dont have all the nice properties Need to formally define the ceiling (= complete set of nice properties) We didnt find a constraint that they fail to satisfy Relevance more accurately modeled with constraints No, they have NOT hit the ceiling yet!

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy The Future of Theory of IR is Bright! 46 We are here Axiomatic Analysis Optimal Model Quantum IR? Probabilistic Model? Vector Space Model? Logic Model? Good luck, everyone! Cranfield You are miles from destination Machine Learning This strategy seems to have worked well in the past Optimal Model

Thank You! Questions/Comments? 47

Keynote at ICTIR 2011, Sept. 13, 2011, Bertinoro, Italy Acknowledgments Collaborators: Hui Fang, Yuanhua Lv, Tao Tao, Maryam Karimzadehgan, and others Funding 48