Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications.

Slides:



Advertisements
Similar presentations
PubMed/How to Search, Display, Download & (module 4.1)
Advertisements

PubMed Searching: Automatic Term Mapping (ATM) PubMed for Trainers, Spring 2014 U.S. National Library of Medicine (NLM) and NLM Training Center.
Search Strategy and Information Retrieval By Rekha Gupta, NIC
PubMed.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
© Literature searching A literature search identifies relevant information sources that are used to answer clinical questions.
Introduction to PubMed® (pubmed.gov)
The NLM Indexing Initiative Alan R. Aronson, PhD Lister Hill Center, National Library of Medicine American Society of Indexers Annual Meeting May 15, 2004.
Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Indexing the Biomedical Literature in a Time of Increased Demand and Limited Resources BioASQ Workshop September 27, 2013 Alan R. Aronson Lister Hill Center,
1.
The National Library of Medicine online resources Salima M’seffar INH- Bibliotheque
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Writing an original research paper Part one: Important considerations
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2005.
NLM Medical Text Indexer (MTI) BioASQ Challenge Workshop September 27, 2013 J.G. Mork, A. Jimeno Yepes, A. R. Aronson.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Announcements ●Exam II range ; mean 72
Article Review Study Fulltext vs Metadata Searching Brad Hemminger School of Information and Library Science University of North Carolina.
Abstract and keywords Sadeghi Ramin, MD Nuclear Medicine Research Center, Mashhad University of Medical Sciences.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
MEDLINE®/PubMed® November 21, 2013 NLM’s PubMed for Trainers* Part II Recap for GaIN Virtual Meeting, Fall 2013 Jane Bridges M.L., AHIP Anna Krampl M.S.L.S.,
PubMed/How to Search, Display, Download & (module 4.1)
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Betsy L. Humphreys, MLS Associate Director for Library Operations National Institutes of Health U.S. Department of Health and Human Services Working on.
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
NICTA Copyright 2013From imagination to impact Identifying Publication Types Using Machine Learning BioASQ Challenge Workshop A. Jimeno Yepes, J.G. Mork,
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
Doug Brutlag 2011 Bibliographic Search Doug Brutlag Professor Emeritus of Biochemistry.
Discovering Gene-Disease Association using On-line Scientific Text Abstracts. Raj Adhikari Advisor: Javed Mostafa.
Find Full Text Journal Articles Using Pubmed Nancy B. Clark, M.Ed. Director of Medical Informatics Education FSU College of Medicine 1 All recourses are.
INDEXATION CRITERIA Christian Kieling, MD Department of Psychiatry, Hospital de Clínicas de Porto Alegre Universidade Federal do Rio Grande do Sul, Brazil.
8 October 2009Microbial Research Commons1 Toward a biomedical research commons: A view from NLM-NIH Jerry Sheehan Assistant Director for Policy Development.
Thomson Reuters ISI (Information Sciences Institute) Azam Raoofi, Head of Indexing & Education Departments, Kowsar Editorial Meeting, Sep 19 th 2013.
Anomalies in Open-Access & Traditional Biomedical Literature: A Comparative Analysis Abstract This research compares rates of anomaly and post-publication.
Journal Searching Nancy B. Clark, M.Ed. Director of Medical Informatics Education FSU College of Medicine 1 All recourses are available online in Medical.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Introduction to Josephine Tan, MLIS Education and Information Consultant, Clinical Sciences
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
Copyright OpenHelix. No use or reproduction without express written consent1.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Medical Text Indexing Joe Thomas Unit Supervisor Index Section, NLM.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
PubMed Searching: Automatic Term Mapping (ATM) PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center.
The National Library of Medicine and its databases Lívia Vasas, PhD
PubMed Basics Barbara A. Wood, MLIS Calder Library University of Miami Miller School of Medicine.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
الله الرحيم بسم الرحمن علیرضا صراف شیرازی دانشیار و مدیر گروه دندانپزشکی کودکان رئیس کتابخانه مرکزی و مرکز علم سنجی دانشگاه علوم پزشکی مشهد.
The National Library of Medicine and its databases
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
The National Library of Medicine and its databases
Lívia Vasas, PhD 2018 The National Library of Medicine and its databases Mozilla Firefox/Google Chrome Lívia Vasas, PhD.
Citation-based Extraction of Core Contents from Biomedical Articles
Literature searching A literature search identifies relevant information sources that are used to answer clinical questions.
Lívia Vasas, PhD 2018 The Nation Library of Medicine and its databases Mozilla Firefox or Google Chrome Lívia Vasas, PhD.
The National Library of Medicine and its databases
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
PubMed/How to Search, Display, Download & (module 4.1)
Presentation transcript:

Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

2 Acknowledgments u Alan R. Aronson, PhD. u Mehmet Kayaalp, M.D., PhD.

3 Outline u Introduction l The System: Medical Text Indexer (MTI) l The Data: Online biomedical journals l The Task: Emulate Medline indexing using full text u Results l Observations on PubMed Central articles l Model selection results l Recent work

5 Why Semi-Automatic Indexing? u U.S. National Library of Medicine indexes 5000 journal titles l Supports over 60 million PubMed searches each month l Has 130 indexers l Indexed 570,000 articles in 2004 n Will need to index 1,000,000 very soon l Automated support is helping to meet this demand –MTI was used on 26% of articles in 2004 u More about MTI l l Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ. The NLM Indexing Initiative's Medical Text Indexer. Medinfo. 2004; 11(Pt 1): PMID:

Title + Abstract et al. Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases Medical Text Indexer (MTI)

7 DCMS with MTI Suggestions

9 Why Full Text? u Medical Text Indexer uses article title and abstract u However l Human indexers taught not to use abstract l Author’s complete intent may not be in abstract l Check tags may only appear in a table or methods section. u If MTI indexes from full text articles it may l Find central concepts missing from abstract l Identify terms when article has no abstract l More accurately select check tags l Be in better compliance with indexing policy

10 Test Collection Selection u Available online from PubMed Central u Consistent XML format l Identifies title, abstract, sections, tables, figures, references, etc. u 500 articles from 17 diverse biomedical journals u Did not use: l References l Graphics l Math

11 Test Collection u 5 Clinical journals (165): l Breast Cancer Research (11) l Journal of Clinical Microbiology (80) u 3 Organization based journals (28): l Journal of American Medical Informatics Assoc. (10) l Proceeding of the National Academy of Sciences (11) u 9 Journals in other categories: Pharmacology (65); Biochemistry (65); Plants (46); Molecular Biology (45); Learning (30); Hospitals (22) Pharmacology (65); Biochemistry (65); Plants (46); Molecular Biology (45); Learning (30); Hospitals (22)

13 Indexing Task

u Medline Indexing beta-Lactamases /*genetics /*metabolism Enterobacteriaceae/drug effects /*enzymology/genetics Plasmids/*genetics Genes, Bacterial/genetics GenotypeKinetics Microbial Sensitivity Tests Molecular Sequence Data Research Support, Non-U.S. Gov't Example Article DNA Transposable Elements DNA Transposable Elements Escherichia coli Escherichia coli Genes, Bacterial Genes, Bacterial Cloning, Molecular Cloning, Molecular Klebsiella pneumoniae Klebsiella pneumoniae Amino Acid Sequence Amino Acid Sequence Microbial Sensitivity Tests Microbial Sensitivity Tests Cephalothin Cephalothin Proteus mirabilis Proteus mirabilis Erwinia Erwinia Salmonella typhimurium Salmonella typhimurium Enterobacteriaceae Infections Enterobacteriaceae Infections Lactams Lactams beta-Lactamases beta-Lactamases Plasmids Plasmids Enterobacteriaceae Enterobacteriaceae beta-Lactam Resistance beta-Lactam Resistance Conjugation, Genetic Conjugation, Genetic Cephalosporin Resistance Cephalosporin Resistance Cefotaxime Cefotaxime Nucleotide Sequences Nucleotide Sequences Molecular Sequence Data Molecular Sequence Data Cephalosporins Cephalosporins Chromosomes, Bacterial Chromosomes, Bacterial DNA, Bacterial DNA, Bacterial u MTI Indexing MMI MMI REL REL MMI & REL MMI & REL Recall = 0.67 Precison = 0.24 F 2 measure = 0.492

15 Evaluation u Measure u F 2 Measure l Weighted harmonic mean of Recall and Precision l Weights Recall twice as important as Precision l Values: 0.0 to 1.0 u Computed for each article and averaged

17 Section Header Classes u Semantically equivalent section headers u u MATERIALS AND METHODS class: l l Materials and Method(s) l l Method(s) l l Scoring Methods l l Experimental Procedures l l Other Methods Tested u CAPTIONS class: l the titles and captions from tables and figures

18 Section Class Average F 2 CAPTIONS ABSTRACT INTRODUCTION RESULTS DISCUSSION NO HEADER …… CONCLUSIONS ABBREVIATIONS Section Class Performance

20 Experiments u Varied MTI components used l MetaMap Indexing (MMI) l Related Citations (REL) u Varied section classes processed l Used model selection l Used binary weighting for sections u A model is l A selection of section classes and l The text in those sections l That represents the article

21 Production Baseline Title+Abstract MMI REL F 2 = 0.457

22 Naive Mode Title+Abstract MMI REL Materials and Methods Results and Discussion No Header F 2 = ( - 0.9%) All Section Classes

23 MetaMap Indexing Mode Title+Abstract MMI REL Introduction Results Discussion Other No Header F 2 = (-18.4%) Captions

24 Augmented Mode Title+Abstract MMI REL Introduction Results Discussion Other No Header F 2 = (+3.9%) Captions

25 Refined Augmented Mode Title+Abstract MMI REL Captions Results Background F 2 = (+ 6.1%)

26 Full MTI Mode Title+Abstract MMI REL Introduction Results Discussion Other No Header F 2 = (+ 6.8%) MMI model Captions

27 Refined Full MTI Title+Abstract MMI REL Results Results and Discussion No Header F 2 = (+ 7.4%) Captions Conclusions

28 MTI Performance Summary Indexing Model RecallPrecision Avg. F 2 Production Baseline (Ti, Ab) Naive Mode (full text) Augmented Mode (MMI + REL (Ti, Ab)) Augmented Mode (refined) Full MTI (MMI + REL common sections) Full MTI (refined)

30 Improvement Potential u With current model l No cut off at 25 terms yields maximum recall of 0.79 u If all good terms prioritized correctly l l F 2 = 0.64 l l Improvement over baseline 7%  40%

31 Increase REL Citations u MTI currently uses 10 Related Citations u Optimal number for full text articles is 15 u Best model confirmed for this setting u Additional Improvement in F 2 = 0.01

32 Summarization u Selecting important text before MTI processing u Using Yeh, Ke, Yang, Meng approach u Combines l Latent Semantic Analysis and l Salton’s Text Relationship Map u Start with current model u Document representation includes l Bag of words l MetaMap identified concepts

NLM Indexing Initiative Clifford W. Gay Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

34 NONE Sections u Most appear in articles that have no abstract l 20/23 u Some are errors l 4 have “Introduction” header in publisher version l 2 appear within other sections with headers. u Many contain the primary text of the article l Comments, Editorials, Letters (11/23)

35 Other Sections u Other section class has 525 sections (16%) u Non-standard article organization l Common in Review articles u Example l ß-Lactamases of Kluyvera ascorbata, Probable Progenitors of Some Plasmid-Encoded CTX-M Types n Bacterial strains. n Antimicrobial agents and susceptibility testing. n Kinetic and IEF analyses. n Genetic characterization of blaKLUA. n Genetic environment of blaKLUA-1. n Arguments for mobilization of chromosomal blaKLUA gene.

36 Ranking Function u Made ranking function for Related Citations more like MetaMap Indexing. u Resulted in a more inclusive model l Materials and Methods l Introduction u F2 measure =

37 Tuning Path Weight u Ratio of weights between the two indexing paths l MetaMap Indexing – 7 l Related Citations – 2 u No improvement possible

38 Partial Weight for Singleton Headers u OTHER section class l Header is unique l Contain content terms u Gave section class weight between 0 and 1 l Some recall improvement l No collection wide improvement in F 2