Download presentation
Presentation is loading. Please wait.
Published byDavid Perkins Modified over 9 years ago
1
Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8) Geneva, Switzerland June 20, 2013 Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA
2
Orientation u NLM is the world's largest biomedical library l Located in Bethesda, Maryland, near Washington, DC u PubMed provides access to MEDLINE, NLM’s bibliographic database of over 20M citations l MEDLINE covers 5600 journals and adds almost 1M new citations each year l PubMed is part of the Entrez system of the National Center for Biotechnology Information (NCBI) 2
3
3 Outline u Anatomy of a MEDLINE citation u Types of PubMed searches l Simple text search l Search based on MeSH indexing u Automatic indexing u Beyond topics
4
Anatomy of a MEDLINE citation 4 Title Abstract Indexing
5
5 MeSH main heading[/subheading(s)] [+ * for major topic]
6
Types of PubMed searches http://www.ncbi.nlm.nih.gov/pubmed
7
Non-semantic search u PubMed does not require the use of MeSH for querying l Supports “Google-like” text searches n “no librarian required” l But can identify MeSH terms even if they are not labeled as such 7
8
Non-semantic search Example u Find articles about the cheese Gruyère l Gruyère 8
9
MeSH (semantic) search u Medical Subject Headings (MeSH) l Controlled vocabulary developed at NLM for indexing and retrieval of MEDLINE citations l ~26,000 descriptors (main headings) l <100 qualifiers (subheadings) l 214,000 supplementary concept records u Hierarchical structure (“tree numbers”) l Supports query expansion (“explosion”) n Search for a descriptor or any of its descendants 9 http://www.nlm.nih.gov/mesh/2013/mesh_browser/MBrowser.html
10
Simple MeSH search Example u Find articles about drug-induced psychoses l "Psychoses, Substance-Induced"[Mesh] 10
11
Search with “Explosion” u By default, PubMed retrieves articles indexed with a descriptor or any of its descendants Use mesh:noexp to prevent “explosion” from happening 11
12
“Explosion” Example u Find articles about fluoroquinolones (or desc.) l "fluoroquinolones"[Mesh] 12
13
Search leveraging synonymy in MeSH u MeSH descriptors include related concepts (Entry terms) l Synonyms l Closely related (and clustered or indexing and retrieval purposes) u All terms from a descriptor and its entry terms are used for retrieval in PubMed 13
14
14
15
Entry terms for “Addison Disease” 15
16
Search leveraging UMLS Synonymy u Unified Medical Language System (UMLS) l Terminology integration system l ~130 biomedical terminologies l Synonymous terms clustered into concepts u UMLS synonymy used in PubMed l Query translation happens “behind the scenes” l E.g., search on “primary adrenocortical insufficiency” n Retrieves articles about “Addison’s disease” 16
17
17
18
No entry term for “Heart attack” 18
19
Query translation 19
20
Subheading restrictions u Subheadings represent the context of use of a particular descriptor l Ciprofloxacin/Adverse effects l Mood Disorders/Chemically induced u Assigned during indexing u Can be queried in PubMed 20
21
Subheading restrictions Example u Find articles about drugs involved in adverse events l "Chemicals and Drugs Category“/adverse effects[MeSH] 21
22
Recapitulative example u Find articles about drugs involved in adverse events and drug-induced manifestations l (("Chemicals and Drugs Category"[Mesh]) AND (adverse effects[sh] OR contraindications[sh] OR mortality[sh])) AND (chemically induced[sh] OR (("Drug-Induced Liver Injury"[Mesh:noexp]) OR ("Drug Eruptions"[Mesh:noexp]) OR ("Epidermal Necrolysis, Toxic"[Mesh]) OR ("Drug-Induced Liver Injury, Chronic"[Mesh]) OR ("Erythema Nodosum"[Mesh]) OR ("Serotonin Syndrome"[Mesh]) OR ("Hand-Foot Syndrome"[Mesh]) OR ("Neuroleptic Malignant Syndrome"[Mesh]) OR ("MPTP Poisoning"[Mesh]) OR ("Dyskinesia, Drug-Induced"[Mesh]) OR ("Neurotoxicity Syndromes"[Mesh:noexp]) OR ("Psychoses, Substance-Induced"[Mesh]) OR ("Akathisia, Drug- Induced"[Mesh]))) AND (medline[sb]) 22
23
Automatic indexing
24
Automatic indexing Motivation u Indexing by humans is costly and has limited reproducibility u Natural language processing can effectively support named entity recognition u Automatic indexing can produce l Suggestions for human indexers l Final indexing for some journals 24
25
Automatic indexing Principles u Hybrid approach l Concepts extracted from title and abstract n Mapped from UMLS to MeSH l MeSH descriptors extracted from related citations u Post-processing l Clustering and ranking l Integrate indexing rules n E.g., “rule of 3” –Index with a higher-level descriptor rather than with 3 or more lower-level descriptors 25
26
u Medical Text Indexer 26 http://ii.nlm.nih.gov/mti.shtml Automatic indexing Workflow
27
Automatic indexing Applications u MEDLINE indexing l Support MEDLINE indexing at NLM n 3600 new citations processed every weeknight n Suggestions displayed in the indexing environment l “First-line” indexing n For 75 journals n MTI recommendations are used as an indexer n Simply reviewed by a senior indexer u Cataloging and History of Medicine l Assisted indexing 27
28
Beyond topics
29
Beyond concepts… relations u Also known as l Facts l Predications l Nano-publications l … u Relation extraction l Usually based on natural language processing (NLP) n E.g., SemRep l Relations stored in (subject, predicate, object) form n With provenance information 29
30
Experimental application Semantic MEDLINE u Multi-document summarization u Based on a database of 60M predications extracted from MEDLINE u Entities normalized to the UMLS Metathesaurus u Relations aligned with the UMLS Semantic Network u Interfaced with PubMed (for retrieving PMIDs) on a given topic l Forms the basis for summarization 30 http://skr3.nlm.nih.gov/SemMedDemo/
31
31
32
32 Relation extraction Applications u Enhanced information retrieval l Indexing on relations in addition to concepts or association main heading/subheading u Multi-document summarization l Extract and visualize the facts extracted from 250 recent abstracts on the treatment of Parkinson’s disease u Question answering l Clinical and biological questions u Knowledge discovery l Connect facts from heterogeneous resources
33
Medical Ontology Research Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Contact:Web:olivier@nlm.nih.govmor.nlm.nih.gov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.