Retrieval 2/2 BDK12-6 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Search Engines and Information Retrieval
Information Retrieval Review
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Practical Considerations for Exploiting the World Wide Web to Create Infobuttons James J. Cimino, Jianhua Li, Mureen Allen, Leanne M. Currie, Mark Graham,
Information Retrieval in Practice
1 CS 430: Information Discovery Lecture 21 Web Search 3.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Recommender Systems; Social Information Filtering.
Integration of Information Resources at the Point of Need James J. Cimino, M.D. Departments of Medicine and Medical Informatics Columbia University.
Information Retrieval
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
How Search Engines Work: A Technology Overview Avi Rappoport Search Tools Consulting UC Berkeley SIMS class.
Overview of Search Engines
Setting up a Profile LRC Information Literacy Series: 7 (Google Scholar Citations) By Shri Ram.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Document databases in medicine. Alpe Adria Master Course :: Medical Informatics :: Dr. J. Dimec: Document databases in medicine.2 Bibliographic databases:
Search Engines and Information Retrieval Chapter 1.
The Technology Behind. The World Wide Web In July 2008, Google announced that they found 1 trillion unique webpages! Billions of new web pages appear.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Chapter 6: Information Retrieval and Web Search
The ISI Web of Knowledge nce/training/wok/#tab3.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Retrieval 1/2 BDK12-5 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Determining the Suitability of Online Research Materials Beth Thompson.
Google Apps: Scholar, Alerts & Gadgets Sam Jacob & Shelby Jackson.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Information Retrieval
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
11 Why tune relevance Because we want to find the one single best item, among a large group of possible candidates….
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Manual Query Modification and Automated Translation to Improve Cross-language Medical Image Retrieval Jeffery R. Jensen William R. Hersh Department of.
WISER: What’s new in Science SCOPUS, SCIRUS and Google Scholar Kate Williams and Juliet Ralph May 2006.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Searching the Web for academic information Ruth Stubbings.
Search Engine Optimization
Automated Information Retrieval
Information Retrieval in Practice
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Search Engine Architecture
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Introduction to Information Retrieval
How to Search in PubMed and ESGO Journal
Information Retrieval and Web Design
Presentation transcript:

Retrieval 2/2 BDK12-6 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University BDK12-61

Natural language retrieval User enters natural language words without Boolean operators – Output usually ranked based on number of words common to query and content items (non-Web) or number of links to items (Web) – This is implicitly an OR, although some systems (e.g., Web search engines) apply an AND Usually used in conjunction with weighted indexing (Salton, 1991) BDK12-62

Natural language retrieval approach User enters free-text query If indexing applied stop list or stemming, must be applied to query words as well Content items scored based on weight of words common to query and content item – Sums TF*IDF weights for all words that occur in both query and content item – Content items may be “normalized” to account for length List sorted and presented to user BDK12-63

This approach allows other features Relevance feedback – Allows system to “find me more documents like these ones” – After user designates relevant content items (documents), query modified New words from relevant content items added Query words not in relevant content items downweighted – Used in PubMed Related Articles feature Query expansion – Relevance feedback without designation of relevant content items, i.e., top-ranking content items assumed to be relevant BDK12-64

Web searching BDK12-65 Searching the Web, e.g., Google, Yahoo, Health Finder, etc. Searching on the Web, e.g., bibliographic databases, textbooks, etc. The visible Web The invisible or deep Web

Searching the Web Web search engines tend to use natural language search, although most allow some Boolean operators, usually – + before word indicates word must occur (AND), e.g., +congestive – - before word indicates word must not occur (NOT), e.g., -congestive Most Web search engines use implicit AND between search terms BDK12-66

Web searching – dominated by the “big three” Search EngineSearches per monthShare Google12.1B64.4% Microsoft Bing3.8B20.1% Yahoo!2.4B12.7% Ask0.3B1.8% AOL0.2B1.1% BDK12-67 Data from (March, 2015) Only change over last few years is Microsoft steady growth over Yahoo! as second-highest search engine

Google has other features Ad words – matching search terms to advertising but clearly demarcating from regular search results ( Image – images on pages retrieved by query ( Scholar – searching of scientific papers (on Web) ( (Beel, 2010) Maps and satellite photos – ( News – latest news ( BDK12-68

Why does Google work so well? Page Rank algorithm ranks pages based on number of links to them (Brin, 1998) – Even though it has had to be “schooled” over the years (Lohr, 2011) Default AND between search terms also helps due to large size of Web This approach works well for Web pages but not necessarily for other types of content Google has many other nifty features, including API for programmers (Dornfest, 2006) BDK12-69

Another feature of Google Scholar allows researchers to create profiles BDK12-610

Retrieval on smartphones and other mobile devices Very popular in clinical settings, with many applications, both proprietary and free, e.g., – NLM Pubmed4Hh – – NLM BabelMeSH – – Publishers such as Unbound Medicine – Portability and instant-on features appealing iOS and Android also allow voice searching But small form factor may not be amenable to more complex searching and viewing of large documents, images, etc. BDK

Infobuttons: direct linkage of patient- based information to knowledge Contexts in EHR or PHR (e.g., specific diagnoses, test results, etc.) lead to generic queries that can be passed to on-line resources The wide variety of content accessible from the Web facilitates this linkage Leading researcher in this area has been Cimino (1996), who has developed Infobutton Manager to manage context and communications between applications (Cimino, 2006) Now an HL7 standard and a requirement for EHR certification in Stage 2 rules for meaningful use (Del Fiol, 2012) BDK12-612

Retrieval of other “objects” Image retrieval – As with indexing, can use semantic or visual queries (Müller, 2004; Müller, 2010) – Semantic (textual) queries usually used to find images of structures, processes, diseases, etc.; e.g., Goldminer – Yottalook – VisualDx - – Visual queries usually used for finding similar images, e.g., “find me more like this” (Grauman, 2010) Annotated content – Searching over metadata fields, e.g., learning objects (Hersh, 2006) BDK12-613