Information Retrieval

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Multimedia Database Systems
Searching for Medicines Information New Zealand College of Pharmacists.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Search Engines and Information Retrieval
Information Retrieval Review
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval
Chapter 5: Information Retrieval and Web Search
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Search Engines and Information Retrieval Chapter 1.
For Evidence-based Practice Information Retrieval for Evidence-based Practice Fall 2001 Suzanne Bakken, RN, DNSc, FAAN School of Nursing & Department of.
Modern Information Retrieval Computer engineering department Fall 2005.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Retrieval Evaluation and the Retrieval Process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan Department of Computer Science University of Illinois, Urbana-Champaign.
Correlating Knowledge Using NLP: Relationships between the concepts of blood cancers, stem cell transplantation, and biomarkers Katy Zou and Weizhong Zhu.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Search Engine Architecture
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
CS315 Introduction to Information Retrieval Boolean Search 1.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Lecture 1: Introduction and the Boolean Model Information Retrieval
Information Retrieval (in Practice)
Modern Information Retrieval
Text Based Information Retrieval
Information Retrieval and Web Search
Search Engine Architecture
Multimedia Information Retrieval
Information Retrieval
Review Key Teaching Points
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
CS246: Information Retrieval
Search Engine Architecture
Information Retrieval for Evidence-based Practice
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Information Retrieval Introduction to Biomedical Informatics 10/27/2015 Information Retrieval

The Problem Set of Journal Articles Information Need What strategies would you use to get the journal articles that meet your information need?

What is Information Retrieval Activity of obtaining information resources relevant to an information need from a collection of information resources. Field concerned with the organization, storage, and retrieval of information. Related Topics: Information Extraction, Natural Language Processing

Examples

Examples

Examples

Examples

Is It a database?

Comparison Database vs IR Data Format Database: Structured Data. Has a schema. IR: Unstructured Documents Queries Database: Language like Structured Query Language Ex: SELECT Cust_No, First_Name FROM Customers WHERE Last_Name='Smith'; IR: Keywords or Natural Language Ex: causes of diabetes Result Database: Exact Result IR: Ranked Results (Hit or Miss)

Specific Definition of IR Objective: Find documents relevant to a query. Documents Relevance Query

Information Retrieval Types Ad-hoc Find documents in a static collection. Ex: Articles about Diabetes in Pubmed. Routing Route documents as they come to groups. Ex: Is this email about work or personal? Filtering Remove irrelevant documents as they come. Ex: spam or not spam?

Types Question Answering Image Recognition Social Search Geo search What are the causes of lung cancer? Ex: smoking, asbestos Image Recognition What does a melanoma look like. Social Search Geo search

The Information Retrieval Cycle Source Selection Resource Query Formulation Query query reformulation, relevance feedback Search Ranked List Selection Documents result Slide is from Jimmy Lin’s tutorial

Information Need/ Query Formulation

Unrecognized Needs Photocoagulation in diabetic retinopathy (1979) 2 years after publication. How many PCPs were aware of the result? 50% Hypertension and Follow-Up Study (1981) 2-6 months after publication. How many physicians knew the benefit of antihypertensive therapy? 20 to 50%

Recognized Needs Various studies measured recognized needs per patient. 2 unmet needs for every 3 patients (Covell). 1.4 questions per patient (Osheroff). 0.33 questions per patient (Dee and Blazek) 0.32 questions per patient (Ely)

Recognized Needs Ely taxonomized questions (top 10) What is the cause of symptom X? What is the dose of drug X? How should I manage disease or finding X? How should I treat finding or disease X? What is the cause of physical finding X? What is the cause of test finding X? Could this patient have disease or condition X? Is test X indicated in situation Y? What is the drug of choice for condition X? Is drug X indicated in situation Y?

Pursing a Need 30-36% of questions were pursued by physicians. Factors Most Associated? (Gorman 95) Urgency Answerability Generalizability Factors Not Associated? Knowledge Uneasiness (of physician) Potential Help (an answer could help the patient) Potential Harm (not having an answer could hurt the patient) Edification Liability (problem involved liability risk) Knowledge of peers (peers know the answer) Difficulty (how difficult to find the answer)

Obstacles to answering clinical questions (Ely). Excessive time required to find information. Difficulty modifying the original question, vague or open to interpretation. Difficulty selecting an optimal strategy to search for information. Failure of a seemingly Appropriate resource to cover the topic. Uncertainty about how to know when all the relevant evidence has been found so the search can stop. Inadequate synthesis of multiple bits of evidence into a clinically useful statement.

Reasons for not pursing an answer (Ely) Doubted existence of relevant information – 25% Readily available consultation leading to referral rather than pursuit – 22% Lack of time to pursue – 19% Not important enough to pursue answer – 15% Uncertain where to look for answer – 8%

The Information Retrieval Cycle Source Selection Resource Query Formulation Query query reformulation, relevance feedback Search Ranked List Selection Documents result Slide is from Jimmy Lin’s tutorial

The IR Black Box Documents Query Results Slide is from Jimmy Lin’s tutorial

Inside The IR Black Box Index Documents Query Representation Document Representation Index Comparison Function Results Slide is from Jimmy Lin’s tutorial

The Central Problem in IR Information Seeker Authors Concepts Concepts Query Terms Document Terms Do these represent the same concepts?

What Makes IR Hard? Text is Unstructured. Nuance in Language Ambiguity Is mastectomy the best treatment for breast cancer? Ambiguity APC, adenomatous polyposis coli, age period cohort direct bilirubin (lab value, substance, result).

Classic Information Retrieval

IR steps Index the Documents. Process the Query. Score the Documents According to the Query. Show Results.

Indexing: Tokenize Documents Information Retrieval Is This This is a document in information retrieval 9/22/2018 Introduction to Information Retrieval

Indexing: Remove stopwords Very frequent words are not good discriminators list of stop words a, about, above, according, across, after, afterwards, again, against, albeit, all, almost, alone, already, also, although, always, among, as, at 9/22/2018 Introduction to Information Retrieval

Indexing: Collapse word forms Simplest: suffix stripping smoke  smok smoker  smok smoking  smok 9/22/2018 Introduction to Information Retrieval

Inverted Index Example Doc 1 Dictionary Postings This is a sample document with one sample sentence Term # docs Total freq This 2 is sample 3 another 1 … Doc id Freq 1 2 … Doc 2 This is another sample document Slide is from ChengXiang Zhai 9/22/2018 Introduction to Information Retrieval

Query Processing. Same process as indexing. Stop word removal Stemming Want to get document to the same form as indexed version.

Scoring Documents Boolean Model Documents which have more query words get scored higher. Simple, not very effective.

Scoring Documents Vector Space Model

Evaluation Gold Standard Recall (aka Sensitivity) Precision Proportion of all relevant documents retrieved. Precision Proportion of retrieved material actually relevant.

Precision vs. Recall All docs Retrieved Relevant

Case Study http://fac.med.nyu.edu

Emerging Applications Automatic evidence recommendations from the medical record. Medical record summarization. Image Search in Publications. Identifying Subsets of Patients from the Medical Record.

In Summary

Summary Defined Information Retrieval. Explored Physician Questions. Showed examples of retrieval.

A neat trick