Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.

Slides:



Advertisements
Similar presentations
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Advertisements

PubMed: Outline Coverage MeSH, mapping and subheadings Simple search Limits Displaying and managing results MeSH database Single citation matcher.
1 CHBE 594 Lecture 14 Data Bases For The Chemical Sciences.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
IN THE NAME OF GOD. Searching PubMed PubMed Home Page.
The National Library of Medicine online resources Salima M’seffar INH- Bibliotheque
Finding the Best Evidence Literature for Evidence Based Health Care.
Library Class for TCM Medline & AMED. Medline MEDLINE® is the U.S. National Library of Medicine's® (NLM) premier bibliographic database that contains.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
| pag. 1 Introduction to Endnote SEMINAR HYDR - Boud Verbeiren Sources: & Library TU Delft Department of Hydrology and Hydraulic.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Copyright © 2006 Pearson Education, Inc. publishing as Benjamin Cummings. The Literature of Health Education Chapter 9.
MS 640: Introduction to Biomedical Information Medical Professionalism Finding Information Using Alumni Medical Library Resources.
MeSH Vocabulary.
PubMed/How to Search, Display, Download & (module 4.1)
Research in the Health Sciences Kerry Sullivan, MLIS Health Sciences Librarian February 2010.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Retrieval 2/2 BDK12-6 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Biological Science Database Proquest WEDAD AL-HUSAINAN ISD/NSTIC Kuwait Institute for Scientific Research November/2012.
Document databases in medicine. Alpe Adria Master Course :: Medical Informatics :: Dr. J. Dimec: Document databases in medicine.2 Bibliographic databases:
How to do a literature search Saharuddin Ahmad Aida Jaffar Department of Family Medicine.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
BME1450: Biomaterials and Biomedical Research Michelle Baratta Engineering & Computer Science Library Maria Buda Dentistry Library.
MBBS Hons 2010 Jill McTaggart Joint PA Hospital/UQ Library MBBS Honours Literature Review.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
PubMed/How to Search, Display, Download & (module 4.1)
How could we index a journal in Scopus? Elham Torbati Kowsar March 2012.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Doug Brutlag 2011 Bibliographic Search Doug Brutlag Professor Emeritus of Biochemistry.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Health & Nursing Gold/Galileo Annual Users Group Conference August 1, 2008.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
Science Direct. Go to Search Article Data Bases (Blue Box) Scroll Down Or Click “S” Science Direct is Third.
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
Retrieval 1/2 BDK12-5 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Table of Contents – Part B HINARI Resources –Clinical Evidence –Cochrane Library –EBM Guidelines –Essential Evidence Plus –HINARI EBM Journals.
Citation Searching To trace influence of publications Tracking authors Tracking titles.
Citation Searching Isabel Holowaty Juliet Ralph
Ovid Tutorial Hardin Library for the Health Sciences.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
1 CS 430: Information Discovery Lecture 5 Ranking.
Table of Contents – Part B HINARI Resources –Clinical Evidence –Cochrane Library –EBM Guidelines –BMJ Practice –HINARI EBM Journals.
WISER: What’s new in Science SCOPUS, SCIRUS and Google Scholar Kate Williams and Juliet Ralph May 2006.
Sources of systematic reviews Arash Etemadi, MD PhD Department of Epidemiology and Biostatistics, Tehran University of Medical Sciences.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
PubMed/How to Search, Display, Download & (module 4.1)
DIALOGBRIEFING Training Advanced Searching on DataStar Web.
OVIDSP Searches Library Informatics 2011/2012 Edit Csajbók Semmelweis University Central Library.
GUIDE. P UB M ED
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
BME1450: Biomaterials and Biomedical Research
PubMed/MeSH - Medical Subject Headings (Advanced Course: Module 1)
CINAHL DATABASE FOR HINARI USERS
Review Key Teaching Points
PubMed/MeSH - Medical Subject Headings (Advanced Course: Module 1)
Introduction to Information Retrieval
PubMed.
The National Library of Medicine and its databases
Table of Contents – Part B
PHARM Library Orientation
PubMed Review.
Presentation transcript:

Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University BDK12-31

Indexing Assignment of metadata to content to facilitate retrieval Two major types – Human indexing with controlled vocabulary – Automated indexing of all words Also address – Indexing other “objects” – UMLS Metathesaurus – Web indexing BDK12-32

Human indexing Usually performed by professional indexer with some background in biomedicine Follows protocol to scan resource and select terms from a controlled vocabulary Most vocabularies are hierarchical and have specific definitions for when term is to be assigned BDK12-33

Medical Subject Headings (MeSH) vocabulary (Colletti, 2001) Over 26,000 terms, with many synonyms for those terms Hierarchical, based on 16 trees, e.g., Anatomy, Diseases, Chemicals and Drugs Contains 83 subheadings, which can be used to make a heading more specific, such as Diagnosis or Therapy MeSH browser allows exploration – BDK12-34

5 A slice of MeSH

MEDLINE indexing Indexing done by professionals who follow protocol first devised by Bachrach (1978) – Read title, introduction, and conclusion and then scan methods, results, figures, tables, and, lastly, abstract – Ignore “key words” of publisher – Assign 2-4 headings (with or without subheadings) as central concepts (or major headings) and another 5-10 as minor headings – Use most specific headings in hierarchy assigned In 1991, added Publication Types – e.g., Randomized Controlled Trial, Meta-Analysis, Practice Guideline, Review Many modern tools have been developed to assist indexing, such as term suggestion and look-up BDK12-36

Other bibliographic indexing Other NLM databases use MeSH Some non-NLM resources use MeSH – MeSH freely available from NLM at Other non-NLM databases have their own subject headings, e.g., – CINAHL subject headings – EMTREE BDK12-37

Other metadata Indexing covers more than content Other attributes of documents to index can include – Author(s) – Source: journal name, issue, pages – Publication or resource type – Relationship to other information e.g., gene identifier, grant number, etc. BDK12-38

Automated indexing Indexing of all words that occur in content items – In bibliographic databases, will usually include title, abstract, and often other fields, e.g., author or subject heading – In full-text documents, will usually include all text, including title Often use a stop word list to remove common words (e.g., the, and, which) Some systems “stem” words to root form (e.g., coughs or coughing to cough) BDK12-39

Weighted indexing (Salton, 1991) Usually used with automated indexing Gives weight to words that are frequent but discriminating Most common approach is for weight to equal product TF*IDF – Inverse document frequency of word i IDF i = log(# documents/# documents with word)+1 – Term frequency of word i in document j TF ij = frequency of word in document BDK12-310

Weighted indexing examples From a database on AIDS – The word AIDS will likely occur in almost every document, while retinopathy will be much more “discriminating” In a general medical database – AIDS will occur much less frequently, so is better indexing term BDK

“Visual” indexing – e.g., Wordle, BDK Scientific publications of your instructor (from SciVal app)

Citation indexing Other content items that “cite” this one, e.g., references, links, etc. Indexing is at content item level Goal is to designate related or important content items Citation databases list all other articles that cite a specific article in journals – e.g., Science Citation Index, SCOPUS, and Google Scholar Novel feature of Google search engine (Brin, 1998) was giving higher weight to Web pages that have more links to them BDK12-313

Limitations of human indexing Inconsistency – When MEDLINE records indexed in duplicate, consistency varies from 63% for central concept headings to 36% for heading-subheading combination (Funk, 1983) – Results verified even with modern indexing tools and methods (Marcetich, 2004) Inadequate indexing vocabulary – Up to 25% of all concepts not represented in MeSH (Hersh, 1994) – Ambiguities and other naming problems with genes, proteins, etc. (Yandell, 2002; Tuason, 2004) BDK12-314

Limitations of word indexing Synonymy – e.g., cancer/carcinoma Polysemy – e.g., lead Context – e.g., high blood pressure Focus – e.g., central vs. incidental concepts Granularity – e.g., antibiotics vs. specific ones BDK12-315