Download presentation
Presentation is loading. Please wait.
1
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University
2
September 11, 2002Language Modeling and Information Retrieval Workshop1 Retrieval As Decision Making Excerpt ? Clustering ? Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? ( ) Query … Ranked list ? 1234
3
September 11, 2002Language Modeling and Information Retrieval Workshop2 Decision Theory Framework observed Partially observed inferred S Source d Document U User q Query R Unified framework can be built on Bayesian decision theory: Models, loss function, risk minimization (Zhai, 2002)
4
September 11, 2002Language Modeling and Information Retrieval Workshop3 Example: Aspect Retrieval Query: What are current applications of robotics? Find as many different applications as possible. Example Aspects A 1 : spot-welding robotics A 2 : controlling inventory A 3 : pipe-laying robots A 4 : talking robot A 5 : robots for loading & unloading memory tapes A 6 : robot telephone operators A 7 : robot cranes … Aspect judgments A 1 A 2 A 3 …... A k d 1 1 1 0 0 … 0 0 d 2 0 1 1 1 … 0 0 d 3 0 0 0 0 … 1 0 …. d k 1 0 1 0... 0 1
5
September 11, 2002Language Modeling and Information Retrieval Workshop4 Aspect Models (Hofmann 1999, Blei, Ng and Jordan., 2001) Aspect 1Aspect 2 1 2 Dirichlet (for example) Generative: Inference: Given aspects and document, what is posterior for ? Learning: Given documents, what are the (ML) aspects? Studied recently in (Minka and Lafferty, 2002)
6
September 11, 2002Language Modeling and Information Retrieval Workshop5 Evaluation Measures What is the best measure? Requires concrete specification of task Several natural measures are computationally intractable, even assuming aspects known (e.g., aspect coverage, aspect uniqueness) Defining aspects is difficult Maximum likelihood cannot be expected to capture “true” semantic relationships in aspects
7
Aspect Retrieval Baselines Aspect Precision Aspect Recall
8
September 11, 2002Language Modeling and Information Retrieval Workshop7 Challenges for IR Models Better task specification and data e.g., TREC interactive data inadequate More advanced models Fewer independence assumptions, greater structure Improved inference and learning algorithms Accuracy and efficiency To handle user preferences, background knowledge Loss function and priors/constraints Probabilistic language models have proven to be an effective way to reason about IR systems. We now need:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.