Download presentation
Presentation is loading. Please wait.
1
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University TH@cs.brown.edu Chief Scientist & Co-founder RecomMind Inc., www.recommind.comwww.recommind.com
2
2 Three Key Challenges in IR … Robustness: Insensitivity of search results with respect to variations of query Structure & topicality: Extracting relevant concepts or topics and using those to improve accuracy and structure search result (e.g.). Integration: Statistical methods with prior/expert/linguistic knowledge, different cues (terms, links, credibility of source, …) Where do language models come in? Are these problems related?
3
3 Concept / Topic-Based View concept-specific language model What is a concept? – A (sparse) distribution over terms in the vocabulary. – Probabilities: How likely is it that a term will express a certain concept? – Concept=hidden, Term=observed document-specific "concept" model Concept-based document representation (Concept-based user representation)
4
4 From Concepts to Language Models Putting both ingredients together concept-based language model Semantic Language Model: – Unsupervised Learning: Probabilistic Latent Semantic Analysis (pLSA, SIGIR'99) – Qualitative pre-structuring of concepts based on thesauri, synsets, categories, topics, etc. – Quantitative model by use of statistical estimation!
5
5 Why Semantic Language Models? "Intelligent", domain-specific smoothing for document-specific unigram models Combines structure and numbers Linguistic resources can be integrated Category & topic information can be integrated User profiles can be integrated (combination with collaborative filtering) Results for ambiguous queries can be structured – most relevant for short queries & heterogeneous domain (Web Search [finally!]) – Other ways to intelligently interact with users.
6
6 Conclusion Using statistical estimation, language models allow us to enrich concept-based retrieval models with quantitative information. Semantic smoothing for improved language models. Integration of various sources of evidence. Richer models for interactive information access (they make sense).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.