Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Artificial Intelligence and Lisp Lecture 13 Additional Topics in Artificial Intelligence LiU Course TDDC65 Autumn Semester, 2010
Using TF-IDF to Determine Word Relevance in Document Queries
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Advance Information Retrieval Topics Hassan Bashiri.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
The Cognitive Perspective in Information Science Research Anthony Hughes Kristina Spurgin.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Which of the two appears simple to you? 1 2.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
Effective Query Formulation with Multiple Information Sources
Grade 8 – Writing Standards Text Types and Purposes (1b) Write arguments to support claims with clear reasons and relevant evidence. Support claim(s) with.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Research Topics/Areas. Adapting search to Users Advertising and ad targeting Aggregation of Results Community and Context Aware Search Community-based.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Artificial Intelligence and Lisp Lecture 13 Additional Topics in Artificial Intelligence LiU Course TDDC65 Autumn Semester,
Course Summary (Lecture for CS410 Intro Text Info Systems)
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
LTI Student Research Symposium 2004 Antoine Raux
Searching and browsing through fragments of TED Talks
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Junghoo “John” Cho UCLA
Query Type Classification for Web Document Retrieval
Presentation transcript:

Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist & Co-founder RecomMind Inc.,

2 Three Key Challenges in IR … Robustness: Insensitivity of search results with respect to variations of query Structure & topicality: Extracting relevant concepts or topics and using those to improve accuracy and structure search result (e.g.). Integration: Statistical methods with prior/expert/linguistic knowledge, different cues (terms, links, credibility of source, …) Where do language models come in? Are these problems related?

3 Concept / Topic-Based View concept-specific language model What is a concept? – A (sparse) distribution over terms in the vocabulary. – Probabilities: How likely is it that a term will express a certain concept? – Concept=hidden, Term=observed document-specific "concept" model Concept-based document representation (Concept-based user representation)

4 From Concepts to Language Models Putting both ingredients together concept-based language model Semantic Language Model: – Unsupervised Learning: Probabilistic Latent Semantic Analysis (pLSA, SIGIR'99) – Qualitative pre-structuring of concepts based on thesauri, synsets, categories, topics, etc. – Quantitative model by use of statistical estimation!

5 Why Semantic Language Models? "Intelligent", domain-specific smoothing for document-specific unigram models Combines structure and numbers Linguistic resources can be integrated Category & topic information can be integrated User profiles can be integrated (combination with collaborative filtering) Results for ambiguous queries can be structured – most relevant for short queries & heterogeneous domain (Web Search [finally!]) – Other ways to intelligently interact with users.

6 Conclusion Using statistical estimation, language models allow us to enrich concept-based retrieval models with quantitative information. Semantic smoothing for improved language models. Integration of various sources of evidence. Richer models for interactive information access (they make sense).