Download presentation
Presentation is loading. Please wait.
1
Information Retrieval
Introduction to Biomedical Informatics 10/27/2015 Information Retrieval
2
The Problem Set of Journal Articles Information Need
What strategies would you use to get the journal articles that meet your information need?
3
What is Information Retrieval
Activity of obtaining information resources relevant to an information need from a collection of information resources. Field concerned with the organization, storage, and retrieval of information. Related Topics: Information Extraction, Natural Language Processing
4
Examples
5
Examples
6
Examples
7
Examples
8
Is It a database?
9
Comparison Database vs IR
Data Format Database: Structured Data. Has a schema. IR: Unstructured Documents Queries Database: Language like Structured Query Language Ex: SELECT Cust_No, First_Name FROM Customers WHERE Last_Name='Smith'; IR: Keywords or Natural Language Ex: causes of diabetes Result Database: Exact Result IR: Ranked Results (Hit or Miss)
10
Specific Definition of IR
Objective: Find documents relevant to a query. Documents Relevance Query
11
Information Retrieval Types
Ad-hoc Find documents in a static collection. Ex: Articles about Diabetes in Pubmed. Routing Route documents as they come to groups. Ex: Is this about work or personal? Filtering Remove irrelevant documents as they come. Ex: spam or not spam?
12
Types Question Answering Image Recognition Social Search Geo search
What are the causes of lung cancer? Ex: smoking, asbestos Image Recognition What does a melanoma look like. Social Search Geo search
13
The Information Retrieval Cycle
Source Selection Resource Query Formulation Query query reformulation, relevance feedback Search Ranked List Selection Documents result Slide is from Jimmy Lin’s tutorial
14
Information Need/ Query Formulation
15
Unrecognized Needs Photocoagulation in diabetic retinopathy (1979)
2 years after publication. How many PCPs were aware of the result? 50% Hypertension and Follow-Up Study (1981) 2-6 months after publication. How many physicians knew the benefit of antihypertensive therapy? 20 to 50%
16
Recognized Needs Various studies measured recognized needs per patient. 2 unmet needs for every 3 patients (Covell). 1.4 questions per patient (Osheroff). 0.33 questions per patient (Dee and Blazek) 0.32 questions per patient (Ely)
17
Recognized Needs Ely taxonomized questions (top 10)
What is the cause of symptom X? What is the dose of drug X? How should I manage disease or finding X? How should I treat finding or disease X? What is the cause of physical finding X? What is the cause of test finding X? Could this patient have disease or condition X? Is test X indicated in situation Y? What is the drug of choice for condition X? Is drug X indicated in situation Y?
18
Pursing a Need 30-36% of questions were pursued by physicians.
Factors Most Associated? (Gorman 95) Urgency Answerability Generalizability Factors Not Associated? Knowledge Uneasiness (of physician) Potential Help (an answer could help the patient) Potential Harm (not having an answer could hurt the patient) Edification Liability (problem involved liability risk) Knowledge of peers (peers know the answer) Difficulty (how difficult to find the answer)
19
Obstacles to answering clinical questions (Ely).
Excessive time required to find information. Difficulty modifying the original question, vague or open to interpretation. Difficulty selecting an optimal strategy to search for information. Failure of a seemingly Appropriate resource to cover the topic. Uncertainty about how to know when all the relevant evidence has been found so the search can stop. Inadequate synthesis of multiple bits of evidence into a clinically useful statement.
20
Reasons for not pursing an answer (Ely)
Doubted existence of relevant information – 25% Readily available consultation leading to referral rather than pursuit – 22% Lack of time to pursue – 19% Not important enough to pursue answer – 15% Uncertain where to look for answer – 8%
21
The Information Retrieval Cycle
Source Selection Resource Query Formulation Query query reformulation, relevance feedback Search Ranked List Selection Documents result Slide is from Jimmy Lin’s tutorial
22
The IR Black Box Documents Query Results
Slide is from Jimmy Lin’s tutorial
23
Inside The IR Black Box Index Documents Query Representation
Document Representation Index Comparison Function Results Slide is from Jimmy Lin’s tutorial
24
The Central Problem in IR
Information Seeker Authors Concepts Concepts Query Terms Document Terms Do these represent the same concepts?
25
What Makes IR Hard? Text is Unstructured. Nuance in Language Ambiguity
Is mastectomy the best treatment for breast cancer? Ambiguity APC, adenomatous polyposis coli, age period cohort direct bilirubin (lab value, substance, result).
26
Classic Information Retrieval
27
IR steps Index the Documents. Process the Query.
Score the Documents According to the Query. Show Results.
28
Indexing: Tokenize Documents
Information Retrieval Is This This is a document in information retrieval 9/22/2018 Introduction to Information Retrieval
29
Indexing: Remove stopwords
Very frequent words are not good discriminators list of stop words a, about, above, according, across, after, afterwards, again, against, albeit, all, almost, alone, already, also, although, always, among, as, at 9/22/2018 Introduction to Information Retrieval
30
Indexing: Collapse word forms
Simplest: suffix stripping smoke smok smoker smok smoking smok 9/22/2018 Introduction to Information Retrieval
31
Inverted Index Example
Doc 1 Dictionary Postings This is a sample document with one sample sentence Term # docs Total freq This 2 is sample 3 another 1 … Doc id Freq 1 2 … Doc 2 This is another sample document Slide is from ChengXiang Zhai 9/22/2018 Introduction to Information Retrieval
32
Query Processing. Same process as indexing. Stop word removal Stemming
Want to get document to the same form as indexed version.
33
Scoring Documents Boolean Model
Documents which have more query words get scored higher. Simple, not very effective.
34
Scoring Documents Vector Space Model
35
Evaluation Gold Standard Recall (aka Sensitivity) Precision
Proportion of all relevant documents retrieved. Precision Proportion of retrieved material actually relevant.
36
Precision vs. Recall All docs Retrieved Relevant
37
Case Study
38
Emerging Applications
Automatic evidence recommendations from the medical record. Medical record summarization. Image Search in Publications. Identifying Subsets of Patients from the Medical Record.
39
In Summary
40
Summary Defined Information Retrieval. Explored Physician Questions.
Showed examples of retrieval.
41
A neat trick
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.