SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Information Retrieval Review
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
INFO 624 Week 3 Retrieval System Evaluation
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
SLIDE 1IS 202 – FALL 2003 Lecture 26: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
CH 11 Multimedia IR: Models and Languages
9/21/2000Information Organization and Retrieval Ranking and Relevance Feedback Ray Larson & Marti Hearst University of California, Berkeley School of Information.
September 7, 2000Information Organization and Retrieval Introduction to Information Retrieval Ray Larson & Marti Hearst University of California, Berkeley.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
User Interfaces for Information Access Prof. Marti Hearst SIMS 202, Lecture 26.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
What is Information Retrieval (IR)?
Text Based Information Retrieval
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval on the World Wide Web
Thanks to Bill Arms, Marti Hearst
IR Theory: Evaluation Methods
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Evaluation of IR Performance
Document Clustering Matt Hughes.
Chapter 5: Information Retrieval and Web Search
Information Organization: Overview
Information Retrieval and Web Design
Discussion Class 9 Google.
Presentation transcript:

SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202

Marti Hearst UCB SIMS 202 Search and Retrieval Outline of Part II of SIMS 202 Overview: Finding Out About Overview: Finding Out About Standard Information Retrieval Models Standard Information Retrieval Models Evaluation of IR Systems Evaluation of IR Systems IR Systems (Implementation Issues) IR Systems (Implementation Issues) Web Specific Issues Web Specific Issues Search Strategies and Tactics Search Strategies and Tactics User Interface Issues User Interface Issues Search on Metadata Search on Metadata Search on Hypertext Search on Hypertext

Marti Hearst UCB SIMS 202 Human Aspects Finding Out About Finding Out About types of information needs types of information needs specifying information needs (queries) specifying information needs (queries) the process of information access the process of information access search strategies search strategies “sensemaking” “sensemaking” Relevance Relevance User Interface User Interface

Marti Hearst UCB SIMS 202 Finding Out About (This discussion is drawn from Belew’s manuscript) Three phases: Three phases: Asking of a question Asking of a question Construction of an answer Construction of an answer Assessment of the answer Assessment of the answer Part of an iterative process Part of an iterative process

Marti Hearst UCB SIMS 202 Information Retrieval Revised Task Statement: Revised Task Statement: Build a system that retrieves documents that users are likely to find relevant to their queries. This set of assumptions underlies the field of Information Retrieval. This set of assumptions underlies the field of Information Retrieval.

Information need Index Pre-process Parse Collections Rank Query text input

Marti Hearst UCB SIMS 202 Query Languages Express the user’s information need Express the user’s information need Components: Components: query language query language program to interpret the language program to interpret the language document collection to retrieve documents from that suit the interpreted query document collection to retrieve documents from that suit the interpreted query

Marti Hearst UCB SIMS 202 Types of Query Languages Boolean Boolean Natural language (free style) Natural language (free style) Hybrid structured (metadata) and free text Hybrid structured (metadata) and free text Form-based Form-based SQL (for database queries) SQL (for database queries)

Marti Hearst UCB SIMS 202 Boolean Queries How queries are satisfied How queries are satisfied Boolean logic Boolean logic meaning of AND, OR, NOT meaning of AND, OR, NOT deMorgan’s law deMorgan’s law precedence ordering precedence ordering Variations Variations faceted boolean faceted boolean proximity operators proximity operators phrases phrases filters/segments filters/segments

Marti Hearst UCB SIMS 202 Evaluation of IR Systems Why, What, and How ? Why, What, and How ? Relevance Relevance Measuring Effectiveness Measuring Effectiveness Precision and Recall Precision and Recall F-measure F-measure Cutoff levels Cutoff levels TREC TREC Blair & Maron study Blair & Maron study

Marti Hearst UCB SIMS 202 Ranking Algorithms As opposed to Boolean As opposed to Boolean How they work How they work The vector document representation The vector document representation Assigning weights to terms Assigning weights to terms why do it why do it tf*idf measure tf*idf measure Similarity measures Similarity measures vector space similarity measure vector space similarity measure how do ranking algorithms behave? how do ranking algorithms behave?

Marti Hearst UCB SIMS 202 Web Search Engines Ranking algorithms Ranking algorithms Web crawling algorithms Web crawling algorithms How web search differs from other kinds of search How web search differs from other kinds of search

Marti Hearst UCB SIMS 202 IR Systems Inverted Files/Indexes Inverted Files/Indexes How documents are converted to inverted indexes How documents are converted to inverted indexes How the files are used for ranking documents How the files are used for ranking documents The Cheshire II system The Cheshire II system Using Lexis/Nexis Using Lexis/Nexis

Marti Hearst UCB SIMS 202 Relevance Feedback Modify existing query based on relevance judgments Modify existing query based on relevance judgments add terms and/or add terms and/or reweight terms reweight terms Automatic or allow users to select from automated list Automatic or allow users to select from automated list Rocchio algorithm Rocchio algorithm How it effects search outcome How it effects search outcome

Marti Hearst UCB SIMS 202 Information Seeking Behavior Search tactics Search tactics Search strategies Search strategies Theories or Models Theories or Models Bates Bates O’Day and Jeffries O’Day and Jeffries Russell et al. Russell et al. How information is used after it is found How information is used after it is found

Marti Hearst UCB SIMS 202 User Interfaces Why important, the role of the interface Why important, the role of the interface How to show the relationship between query, collection, and retrieval results How to show the relationship between query, collection, and retrieval results TileBars TileBars How to support the process of search How to support the process of search Sketchtrieve Informal Interface Sketchtrieve Informal Interface DLITE (only on videotape) DLITE (only on videotape)

Marti Hearst UCB SIMS 202 Metadata in Search What is metadata for? What is metadata for? Pros and cons of search using controlled vocabulary Pros and cons of search using controlled vocabulary Pros and cons of search using uncontrolled vocabulary Pros and cons of search using uncontrolled vocabulary Combining metadata and uncontrolled vocabulary in search Combining metadata and uncontrolled vocabulary in search Convert free text into controlled vocab Convert free text into controlled vocab Organizing result sets (Cat-a-Cone) Organizing result sets (Cat-a-Cone)

Marti Hearst UCB SIMS 202 Hypertext and Search Components of a hypertext system Components of a hypertext system Browsing vs. search on hypertext Browsing vs. search on hypertext General tendencies for searching hypertext General tendencies for searching hypertext Egan et al study (Superbook) Egan et al study (Superbook) Campagnoni & Ehrlich study Campagnoni & Ehrlich study

Marti Hearst UCB SIMS 202 Things we didn’t get to Search Issues Search Issues Source Selection Source Selection Genre Genre Quality/Verity Quality/Verity Collaborative Filtering (in reader part II) Collaborative Filtering (in reader part II) Question Answering Question Answering Multilingual Search (in reader part II) Multilingual Search (in reader part II) Machine Learning (in reader part II) Machine Learning (in reader part II) AI/Language Analysis (in reader part I) AI/Language Analysis (in reader part I)

Marti Hearst UCB SIMS 202 Some Follow-on Courses 240 Principles of Information Retrieval (Larson Sp 98) 240 Principles of Information Retrieval (Larson Sp 98) 257 Database Management (Larson Sp 98) 257 Database Management (Larson Sp 98) 247 Information Visualization (Hearst Sp 98) 247 Information Visualization (Hearst Sp 98) 213 User Interface Design and Development (Hearst Sp 99) 213 User Interface Design and Development (Hearst Sp 99) 214 Needs Assessment and Evaluation of Information Sysetms 214 Needs Assessment and Evaluation of Information Sysetms 245 Organization of Information in Collections 245 Organization of Information in Collections