How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Slides:



Advertisements
Similar presentations
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Advertisements

Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Search Engines and Information Retrieval
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
1 Information Retrieval and Web Search Introduction.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Internet Resources Discovery (IRD) Advanced Topics.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
CP3024 Lecture 12 Search Engines. What is the main WWW problem?  With an estimated 800 million web pages finding the one you want is difficult!
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
User Interfaces and Information Retrieval Dina Reitmeyer WIRED (i385d)
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Document Clustering for Natural Language Dialogue-based IR (Google for the Blind) Antoine Raux IR Seminar and Lab Fall 2003 Initial Presentation.
IR Theory: Web Information Retrieval. Web IRFusion IR Search Engine 2.
Introduction to Information Retrieval. What is IR? Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Thanks to Bill Arms, Marti Hearst
Introduction to Search Engines
Data Mining Chapter 6 Search Engines
Introduction to Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Information Retrieval and Web Design
Information Organization: Overview
Introduction to Search Engines
Presentation transcript:

How Do We Find Information?

Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom” Francis Bacon Search Engines 2

What are we looking for?  We are  Looking for X.  Q&A: population of China  Known-item Search: “Cather in the Rye”  Looking for something like/about X.  General/background info: Taliban  Collection Development: IR Literature  Similar to (known) X: like “Cather in the Rye”  WhatyoumacallX: “the rye-boy story”  Looking for something  Problem Resoultion: how can we fight terrorism?  Knowledge Development: what is IR?  Looking  Need something, but don’t know what – what’s it all about?  Serendipity: Web surfing Search Engines 3

How do we find it?  Brute force search  Easy to build, maintain, and use  Searcher does all the work; Hard to get satisfaction  Organize/structure the data (Information Organization)  Intuitive to use  Hard to build and maintain  Knowledge of builder’s language & organization structure is crucial  Use a search tool (Information Retrieval)  Easier to build and maintain: Less manipulation of data  Sometimes works, sometimes not (Helps to know the language of the data)  Ask the experts (Expert System)  Easy and satisfying to use (by definition)  “Expert” knowledge is transitory, hard to encapsulate  Go with the crowd (User Ratings > Recommender System > PageRank)  Relatively easy to build and maintain  Limited utility: doesn’t work with “unpopular” X  Zen-Fusion search. Search Engines 4

Information Seeking Process: Dynamic, Interactive, Iterative UserIntermediaryInformation What am I looking for? - Identification of info. need How do I find it? - Query formulation What are we looking for? - Discovery of user’s information need - Query representation Where is it? - Query-document matching What is it? - Collection - Classification How is it found? - Data structure - Representation 5 Search Engines

IR vs. IO Information Organization: - Add structure & annotation Information Retrieval - Create a searchable index Information Access - Retrieve information Data Mining - Discover Knowledge 6 Search Engines

Information Retrieval Representation - indexing, term weighting Searchable IndexRaw Data Query Formulation - “What is IR?” Search Results - (ranked) document list D1wd1 wd2 wd3 D2wd2 wd4 wd2 wd3 D3wd1 wd4 D1D2D3 wd1101 wd2120 wd3110 wd4011 1D2 2D1 3D3 7 Search Engines

Information Organization Representation - NLP & Machine Learning Organized DataRaw Data Query Formulation - “What is IR?” Search Results - document groups 8 Search Engines

Natural Language Processing (NLP)  Research Area, technique, tool for  Knowledge Discovery, Data Mining  Lexical Analysis using  Part-of-Speech (POS) tagging  Sentence Parsing 9 Search Engines

Machine Learning  Research Area, technique, tool for  Information Organization, Knowledge Discovery, Data Mining  Information Organization via  Supervised Learning (Automatic Classification)  Unsupervised Learning (Clustering) Class 1 Class 2 Class 1 Class 2 Classification Clustering 10 Search Engines

 Clustering  Document Clustering  Cluster Hypothesis – Documents having similar contents tend to be relevant to the same query  Rank clusters by Query-Cluster Similarity – Cluster documents based on vector similarity  Post-retrieval clustering – Scatter-Gather Scatter-Gather  Keyword Clustering  Automatic Thesaurus Construction – Query Expansion IO for IR 11 Search Engine

 Classification  Document Categorization  classify documents into manually defined categories – supports hierarchical browsing, query expansion via relevance feedback  Document Indexing  assign keywords to documents – automatic indexing with controlled vocabulary, metadata generation  Document Filtering  e.g. news delivery, spam filtering  Query Classification  collection selection  algorithm selection IO for IR 12 Search Engine

Search Engines 13