INF3854 / CS395T (Fall 2013): Concepts of Information Retrieval (& Web Search) Instructor: Matt Lease

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Information Retrieval in Practice
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
Information Retrieval in Practice
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Overview of Search Engines
Information Retrieval in Practice
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
IR in a Nutshell: Applications, Research, and Challenges Session 1 Feb 21 st 2013 Tamer Elsayed.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Lecture 1: Web Search Overview & Web Crawling
1. Search Engines Architecture Azreen Azman, PhD SMM 5891 All slides ©Addison Wesley, 2008.
Search Engines and Information Retrieval Chapter 1.
Multimedia Databases (MMDB)
Modern Information Retrieval Computer engineering department Fall 2005.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2006.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search Engine Architecture
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Pengantar Temu Balik Informasi Dosen: Hapnes Toba Ruang: R. Dosen Lt. 8 GWM.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
IR. SI 650/EECS 549 Information Retrieval People search the Web daily Search engines –Google –Bing –Baidu –Yandex Information Retrieval is about search.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
1 CS 430: Information Discovery Lecture 14 Usability I.
Web Information Retrieval Textbook by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Notes Revised by X. Meng for SEU May 2014.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Rensselaer Polytechnic Institute CSCI-4220 – Network Programming David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st edition.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
R.G. Bias | | Name that tune. Song title? Performer(s)? 1.
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Information Retrieval in Practice
Visual Information Retrieval
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Information Retrieval (in Practice)
Modern Information Retrieval
Proposal for Term Project
Evaluation of IR Systems
Information Retrieval and Web Search
Search Engine Architecture
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Information Retrieval and Web Search
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Search Engine Architecture
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

INF3854 / CS395T (Fall 2013): Concepts of Information Retrieval (& Web Search) Instructor: Matt Lease

Some IR Tasks Ad-hoc search –Find relevant documents for an arbitrary text query Filtering –Identify relevant user profiles for a new document Classification –Identify relevant labels for documents Question answering –Give a specific answer to a question

Dimensions of IR ContentApplicationsTasks TextWeb searchAd hoc search ImagesVertical searchFiltering VideoEnterprise searchClassification Scanned docsDesktop searchQuestion answering AudioForum search MusicP2P search Literature search Some slides ©Addison Wesley, 2008

Verticals/content: news, sports, classifieds, … Format: text, images, audio, video –text: html/xml, text, , chat, transcribed, blog, … Repository/archive/collection –desktop/mobile, enterprise, Web Query: descriptive (textual/spoken), by example –Typically inexact (NOT ISBN, barcode, GUID, etc.) Typically both content & query are unstructured or only semi-structured (e.g., not database) Search/Retrieval Landscape

Relevance What is it? –Simplistic definition: A relevant document contains the information that a person was looking for when they submitted a query to the search engine –Many factors influence a person’s decision about what is relevant: e.g., task, context, novelty, style –Topical relevance vs. user relevance

Modeling Relevance Ranking algorithms used in search engines Ranking is typically statistical and based on its observable properties rather than underlying linguistic properties –i.e. counting simple text features such as words instead of inferring underlying linguistic syntax –However, both kinds of features / evidence can be incorporated into a statistical model

Evaluation Experimental procedures and measures for comparing system output to user expectations –Originated in Cranfield experiments in the 60s Experiments often use one or more pre-defined test collections of documents, queries, and relevance judgments Recall and precision are two examples of effectiveness measures

Users and Information Needs Search evaluation is user-centered Keyword queries are often poor descriptions of actual information needs Interaction and context are important for inferring user intent Query refinement techniques such as query expansion, query suggestion, relevance feedback improve ranking

IR and Search Engines Relevance -Effective ranking Evaluation -Testing and measuring Information needs -User interaction Performance -Efficient search and indexing Incorporating new data -Coverage and freshness Scalability -Growing with data and users Adaptability -Tuning for applications Specific problems -e.g. Spam Information Retrieval Search Engines

Web++ Search Today Search suggestions Query-biased summarization / snippet generation Sponsored search Search shortcuts Vertical search (news, blog, image)

Web++ Search Today II Vertical search (local) Spelling correction Personalized search / social ranking

Web++ Search Today III

Web++ Search Today IV

Indexing Process

Query Process

Who and Where?

Collaborative Search

User Search Engine Feedback Cycle Query formulation reflects an ongoing dialog between users and search engines Users formulate queries for the search engine, based on a mental model of what it “understands” Search engines optimize their “understanding” for the (most frequent) submitted queries Individual session and long term, personal and aggregate Result: query “language” is continually evolving “Handwriting recognition”

Verbosity and Complexity Complex information requires complex description –Information theory [Shannon’51] –Human discourse implicitly respects this [Grice’67] Simple searches easily expressed in keywords –navigation: “alaska airlines” –information: “american revolution” Verbosity naturally increases with complexity –More specific information needs [Phan et al.’07] –Iterative reformulation [Lau and Horvitz’99]

Query Disambiguation Given (typically terse like “apple”) query, infer possible underlying intents / needs / tasks With longer queries, detect key concepts and/or segment (e.g. “new york times square”)

User queries from Analysis by [Bendersky and Croft’09] Long / Verbose Web Queries Mean Reciprocal Rank rank clicked vs. query Length

Vertical Search Aka/related: federated / distributed / specialty Searching the “Deep” web One-size-fits-all vs. niche search –Query formulation, content, usability/presentation

Cross-Lingual IR 2/3 of the Web is in English About 50% of Web users do not use English as their primary language Many (maybe most) search applications have to deal with multiple languages –monolingual search: search in one language, but with many possible languages –cross-language search: search in multiple languages at the same time

Cross-Lingual IR Ideal Let user express query in native language Search information in multiple languages Translate results into user’s native language

Routing / Filtering Given standing query, analyze new information as it arrives –Input: all , RSS feed or listserv, … –Typically classification rather than ranking –Simple example: Ham vs. spam –Anomaly detection

Spoken Search Longer and more natural queries emerge given support for spoken input [Du and Crestiani’06] See also: studies by Nick Belkin

Location-based Search

Content-based music search

Spoken “Document” Retrieval

Other Visual Interfaces

Retrieving Information, not Documents

Entity Search

Question Answering & Focused Retrieval

Community QA

Expertise Search

Social Media

Blog Search

μ-Blog Search (e.g. Twitter)

What’s Happening / Trending?

Social Bookmarking/Tagging

News Tracking (Living Stories)

Memetracker

“Hyper-local” Search

e-Discovery

Book Search Find books or more focused results Detect / generate / link table of contents Classification: detect genre (e.g. for browsing) Detect related books, revised editions Challenges –Variable scan quality, OCR accuracy –Copyright –Monetary model

The Information’s Out There

Crowdsourcing

Mechanical Turk