Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Chapter 5: Introduction to Information Retrieval
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
User Modeling Thoughts on LMs James Allan Center for Intelligent Information Retrieval University of Massachusetts, Amherst September 11, 2002.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Information Retrieval – Introduction and Survey Norbert Fuhr University of Duisburg-Essen Germany
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Information Retrieval – Introduction and Survey Norbert Fuhr University of Duisburg-Essen Germany
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Context-based Search in Topic Centered Digital Repositories Christo Dichev, Darina Dicheva Winston-Salem State University Winston-Salem, N.C. USA {dichevc,
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
1 Thi Nhu Truong, ChengXiang Zhai Paul Ogilvie, Bill Jerome John Lafferty, Jamie Callan Carnegie Mellon University David Fisher, Fangfang Feng Victor Lavrenko.
Research Topics/Areas. Adapting search to Users Advertising and ad targeting Aggregation of Results Community and Context Aware Search Community-based.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Information Retrieval
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
University of Malta CSA3080: Lecture 10 © Chris Staff 1 of 18 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Lecture 1: Introduction and the Boolean Model Information Retrieval
Information Retrieval and Web Search
Information Retrieval and Web Search
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Data Warehousing and Data Mining
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Web Mining Research: A Survey
Topic: Semantic Text Mining
Presentation transcript:

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada

Report of a Workshop James Allan, et al., “Challenges in Information Retrieval and Language Modeling”. Report of a Workshop held in the Center for Intelligent Information Retrieval, University of Massachusetts Amherst, September The following presentation is based on:

Long-Term Challenges LT Challenge 1 – Global Information Access –Satisfy human information needs through natural, efficient interaction with an automated system that leverages world-wide structured and unstructured data in any language Need –Massively distributed, multi-lingual retrieval systems –Techniques from distributed retrieval, data fusion, cross-lingual IR

Long-Term Challenges LT Challenge 2 – Contextual Retrieval –Combine search technologies and knowledge about query and user context into a single framework in order to provide the most “appropriate” answer for a user’s information needs Need –Context and query features to infer characteristics of the info need such as query type, answer type, answer level, task etc.

User Information Need Query User Profile Task Activity

Topics 1.Retrieval Models 2.Cross-Lingual information Retrieval 3.Web Search 4.User Modeling 5.Filtering, Topic Detection & Tracking, and classification 6.Summarization 7.Question Answering 8.Metasearch and distributed retrieval 9.Multimedia retrieval 10.Information extraction 11.Testbeds

Topics 1.Retrieval Models 2.Cross-Lingual information Retrieval 3.Web Search 4.User Modeling 5.Filtering, Topic Detection & Tracking, and classification 6.Summarization 7.Question Answering 8.Metasearch and distributed retrieval 9.Multimedia retrieval 10.Information extraction 11.Testbeds

User Modeling Much research over the past number of years has abstracted the user out of the retrieval problem But, in recent years, the rate of improvement of IR systems has slowed One reason may be that generic IR systems are “good-enough” for everyone but “never great” for anyone It is suggested that greater focus on the user will enable major advances in IR

How Do We Get Info About the User?

a priori –Ask the user a posteriori –Explicit Show user a document and ask them if it was relevant –Implicit Track what the user does –Web logs –Time spent reading a page

How Do We Model the User?

IR Technique –A vector of terms or features supplied by the user or drawn from documents deemed relevant to the user –May be static or adaptive Machine Learning Technique –An adaptive technique such as a neural net that “learns” the preferences of the user –Feature set selection is important

User Model as Filter Query representation Document representation Matching algorithm results Information need User Model as Filter

User Model as Query Document representation Matching algorithm results Information need User Model as Query

Integrating the User Model and the Query Query User Profile Modified Query Moving the Query within the Document Space

Integrating the User Model and the Query Document Space p q q'q'

Integrating the User Profile and the Query Document Space pq

Integrating the User Profile and the Query Document Space p q

Short-term/Long-term Interests Users’ interests change over time May have short-term interests but we do not want these to skew our models away from our long-term interests Particular focus is electronic news

Single task/Multiple tasks Most user models are built for a specific task, such as filtering news items looking for certain types of news Most people multi-task so we currently run multiple user models for different tasks for the same user Really would like to have a single model for multiple tasks

Filtering, Topic Detection & Tracking and Classification Some of these technologies have been adopted widely These topics are grouped together because they are similar technologies used in similar applications

Routing of and phone messages for Customer Relationship Management Message Message Routing System Service Department New Accounts Customer Complaints

Categorization of Trouble Tickets Trouble Ticket Ticket Routing System Trouble Category 1 Trouble Category 2 Trouble Category 3

Topic Detection News Item News Item Routing System Topic 1 Topic 2 Topic 3 New Topic

Topic Tracking Topic Sub-Topic

Topic Tracking WMD in Iraq Invasion of Iraq to locate WMD Cannot find WMD Bush and Kerry debate reasons for invading Iraq Election Day in USA Nov ‘02Mar ‘03 Jan ‘04Sept ‘04 Nov ‘04

Summarization Text summarization is an active field of research in both IR and Natural Language Processing (NLP) NLP is required for high-quality summarization IR summarization can provide access to large repositories of data in an efficient way IR summarization shares some basic techniques with indexing as both are concerned with identifying what a document is “about”

Summarization A summary can consist of: –A set of keywords or noun phrases –A set of sentences with “important” terms A summary can be about: –A single document (but not generally) –A set of documents –A web site

Summarization Each document is represented as a vector and tf.idf is used to determine the best terms Cluster the documents, create the centroids, and determine the best terms Sentences are given weights based on occurrence of terms and the associated tf.idf weights

Metasearch and Distributed Retrieval Retrieving and combining information from multiple sources: –Data fusion the combination of information from multiple sources that index an effectively common data set –Collection fusion or distributed retrieval the combination of info from multiple sources that index effectively disjoint data sets

Issues for Metasearch and DR Resource description Resource ranking Resource selection Searching Merging of results

Major Issue Resource description Resource ranking Resource selection Searching Merging of results Semantic Interoperability

Summary IR is no longer the domain of the “specialist” – everyone gets to play Drowning in information Next Generation IR tools must be dramatically better than what we have IR field must rethink its basic assumptions and evaluation methodologies because the ones that brought us to the level of success we have today will not be sufficient to reach the next level

Long-Term Challenges Global Information Access Contextual Retrieval