Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.

Similar presentations


Presentation on theme: "Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist."— Presentation transcript:

1 Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University TH@cs.brown.edu Chief Scientist & Co-founder RecomMind Inc., www.recommind.comwww.recommind.com

2 2 Three Key Challenges in IR … Robustness: Insensitivity of search results with respect to variations of query Structure & topicality: Extracting relevant concepts or topics and using those to improve accuracy and structure search result (e.g.). Integration: Statistical methods with prior/expert/linguistic knowledge, different cues (terms, links, credibility of source, …) Where do language models come in? Are these problems related?

3 3 Concept / Topic-Based View concept-specific language model What is a concept? – A (sparse) distribution over terms in the vocabulary. – Probabilities: How likely is it that a term will express a certain concept? – Concept=hidden, Term=observed document-specific "concept" model Concept-based document representation (Concept-based user representation)

4 4 From Concepts to Language Models Putting both ingredients together concept-based language model Semantic Language Model: – Unsupervised Learning: Probabilistic Latent Semantic Analysis (pLSA, SIGIR'99) – Qualitative pre-structuring of concepts based on thesauri, synsets, categories, topics, etc. – Quantitative model by use of statistical estimation!

5 5 Why Semantic Language Models? "Intelligent", domain-specific smoothing for document-specific unigram models Combines structure and numbers Linguistic resources can be integrated Category & topic information can be integrated User profiles can be integrated (combination with collaborative filtering) Results for ambiguous queries can be structured – most relevant for short queries & heterogeneous domain (Web Search [finally!]) – Other ways to intelligently interact with users.

6 6 Conclusion Using statistical estimation, language models allow us to enrich concept-based retrieval models with quantitative information. Semantic smoothing for improved language models. Integration of various sources of evidence. Richer models for interactive information access (they make sense).


Download ppt "Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist."

Similar presentations


Ads by Google