Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.

Similar presentations


Presentation on theme: "Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18."— Presentation transcript:

1 Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18

2 Marti A. Hearst SIMS 202, Fall 1997 Search and Retrieval Outline of Part II of SIMS 202 n Human Aspects n Standard Information Retrieval Models n Evaluation of IR Systems n Implementation Issues n Web Specific Issues n User Interface Issues n Special Kinds of Search

3 Marti A. Hearst SIMS 202, Fall 1997 Human Aspects n Finding Out About n types of information needs n specifying information needs (queries) n the process of information access n search strategies n “sensemaking” n Relevance n Modeling the User

4 Marti A. Hearst SIMS 202, Fall 1997 Today n Finding Out About n Intro to Standard Information Retrieval n Intro to Boolean Queries

5 Marti A. Hearst SIMS 202, Fall 1997 Finding Out About (This discussion is drawn from Belew’s manuscript) n Three phases: n Asking of a question n Construction of an answer n Assessment of the answer n Part of an iterative process

6 Marti A. Hearst SIMS 202, Fall 1997 Question Asking n Person asking = “user” n In a frame of mind, a cognitive state n Aware of a gap in their knowledge n May not be able to fully define this gap n Paradox of FOA: n If user knew the question to ask, there would often be no work to do. n Query n External expression of this ill-defined state

7 Marti A. Hearst SIMS 202, Fall 1997 Question Answering n Say question answerer is human. n Can they translate the user’s ill-defined question into a better one? n Do they know the answer themselves? n Are they able to verbalize this answer? n Will the user understand this verbalization? n Can they provide the needed background? n Say answerer is a computer system.

8 Marti A. Hearst SIMS 202, Fall 1997 Assessing the Answer n How well does it answer the question? n Complete answer? Partial? n Background Information? n Hints for further exploration? n How relevant is it to the user?

9 Marti A. Hearst SIMS 202, Fall 1997 Finding Out About is an Iterative Process Repositories Workspace Goals

10 Marti A. Hearst SIMS 202, Fall 1997 Finding Out About is a Dialog n The exchange doesn’t end with first answer n User can recognize elements of a useful answer n Questions and understanding changes as the process continues.

11 Marti A. Hearst SIMS 202, Fall 1997 Assessing the Answer n Example: n Student asks Prof about a topic n Prof responds with three or four pearls of wisdom assumed closest to your needs n Student’s response: n Wait, that isn’t what I meant! n Let me ask it another way… n That helps, but I still have this problem… n Why is it wrong to send student away?

12 Marti A. Hearst SIMS 202, Fall 1997 Relevance Feedback (part of the iterative process) n The student’s feedback to the prof: n “thanks, that helps” n “silence” or “huh?” n “what does that have to do with anything?” n System: for each document retrieved n user responds with relevance assessment n binary: + or - n utility assessment (between 0 and 1)

13 Marti A. Hearst SIMS 202, Fall 1997 n Later in the course: n Search Strategies n User interfaces to improve FOA process n Incorporation of Content Analysis into better systems

14 Marti A. Hearst SIMS 202, Fall 1997 Restricted Form of the Problem n The system has available only pre-existing, “canned” text passages. n Its response is limited to selecting from these passages and presenting them to the user. n It must select, say, 10 or 20 passages out of millions or billions!

15 Marti A. Hearst SIMS 202, Fall 1997 Information Retrieval n Revised Task Statement: Build a system that retrieves documents that users are likely to find relevant to their queries. n This set of assumptions underlies the field of Information Retrieval.

16 Marti A. Hearst SIMS 202, Fall 1997 Some IR History n Roots in the scientific “Information Explosion” following WWII n Interest in computer-based IR from mid 1950’s n H.P. Luhn at IBM (1958) n Probabilistic models at Rand (Maron & Kuhns) (1960) n Boolean system development at Lockheed (‘60s) n Vector Space Model (Salton at Cornell 1965) n Statistical Weighting methods and theoretical advances (‘70s) n Refinements and Advances in application (‘80s) n User Interfaces, Large-scale testing and application (‘90s)

17 Marti A. Hearst SIMS 202, Fall 1997 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

18 Marti A. Hearst SIMS 202, Fall 1997 Relevance n In what ways can a document be relevant to a query? n Answer precise question precisely. n Who is buried in grant’s tomb? Grant. n Partially answer question. n Where is Danville? Near Walnut Creek. n Suggest a source for more information. n What is lymphodema? Look it up in medline. n Give background information. n Remind the user of other knowledge. n Others...

19 Marti A. Hearst SIMS 202, Fall 1997 Query Languages n A way to express the question (information need) n Types: n Boolean n Natural Language n Stylized Natural Language n Form-Based (GUI)

20 Marti A. Hearst SIMS 202, Fall 1997 Simple query language: Boolean n Terms + Connectors n terms n words n normalized (stemmed) words n phrases n thesaurus terms n connectors n AND n OR n NOT

21 Marti A. Hearst SIMS 202, Fall 1997 Boolean Queries n Cat n Cat OR Dog n Cat AND Dog n (Cat AND Dog) n (Cat AND Dog) OR Collar n (Cat AND Dog) OR (Collar AND Leash) n (Cat OR Dog) AND (Collar OR Leash)

22 Marti A. Hearst SIMS 202, Fall 1997 Boolean Queries n (Cat OR Dog) AND (Collar OR Leash) n Each of the following combinations works: n Catxxxxx n Dogxxxxx n Collarxxx n Leashxxxxx

23 Marti A. Hearst SIMS 202, Fall 1997 Boolean Queries n (Cat OR Dog) AND (Collar OR Leash) n None of the following combinations work: n Catxx n Dogxx n Collarxx n Leashxx

24 Marti A. Hearst SIMS 202, Fall 1997 Boolean Logic A B

25 Marti A. Hearst SIMS 202, Fall 1997 Psuedo-Boolean Queries n A new notation, from web search n +cat dog +collar leash n Does not mean the same thing! n Need a way to group combinations. n Phrases: n “stray cat” AND “frayed collar” n +“stray cat” + “frayed collar”

26 Marti A. Hearst SIMS 202, Fall 1997 Preview of later lectures: Web-Specific Issues n Web Crawling n Search Issues n Source Selection n Hyperlinks n Genre n Quality/Verity n Special Implementation Issues

27 Marti A. Hearst SIMS 202, Fall 1997 Preview of Later Lectures: Special Kinds of Search n Question Answering n Source Selection n Multimedia Information Access n Image Search n Hypertext Search n Incorporating MetaData n Multilingual Search


Download ppt "Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18."

Similar presentations


Ads by Google