1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.

Slides:



Advertisements
Similar presentations
Ontological analysis of the semantic types Anand Kumar MBBS, PhD IFOMIS, University of Saarland, Germany. BIOMEDICALONTOLOGYBIOMEDICALONTOLOGY.
Advertisements

Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The NLM Controlled Vocabulary Medical Subject Headings (MeSH) PubMed for Trainers, Spring 2015 U.S. National Library of Medicine (NLM) and NLM Training.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
The Decision-Making Process IT Brainpower
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS Advanced Technologies Seminar June 15, 2000.
I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
UCB BioText TREC 2003 Genomics Track Participants: Marti Hearst Gaurav Bhalotia, Preslav Nakov, Ariel Schwartz University of California, Berkeley Genomics:
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing Preslav Nakov, Ariel Schwartz, Brian Wolf, Marti Hearst Computer.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI , ARDA.
1 The BioText Project Myers Seminar Sept 22, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI , ARDA AQUAINT,
Human Molecular Genetics Section 14–3
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario SIMS UC Berkeley.
Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing Preslav Nakov, Ariel Schwartz, Brian Wolf, Marti Hearst Computer.
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Bioinformatics and medicine: Are we meeting the challenge?
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
The contents of pathology The contents of pathology   Aetiology (the causes )   Pathogenesis (mechanisms)   pathologic changes: structural & functional.
Health Research in Thailand: A Gap Analysis Krit Pongpirul, MD. International Health Policy Program (IHPP-Thailand)
Searching PubMed® TTUHSC Preston Smith Library presents Rev. 04/03/13.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS IMA Text Mining Workshop April 17, 2000.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
 Major concepts  Focused on key issues for practice, education, and administration  Examples: chronic pain, acute pain, self-care, coping, health.
Medical Subject Headings (MeSH)
The Cancer Registry of Norway Jan F Nygård Head of the IT-department.
9.1 Manipulating DNA KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
CAREERS IN PATHOLOGY. PATHOLOGY Pathology is described as “the study of disease” or in other words the scientific study of the way things go wrong In.
MeSH: Medical Subject Headings Anne Allen, Heather Braum, Paula Davidson, Ellen Rose LI 804: Organization of Information.
Basics of Procedural Coding
DIAGNOSIS OF DISEASES AND GENE THERAPY
Biotechnology.
NeurOn: Modeling Ontology for Neurosurgery
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Biomedical Therapies Foundation Standard 1: Academic Foundation
In your own words and off the top
Category-Based Pseudowords
KEY CONCEPT Genetics provides a basis for new medical treatments.
Ontological analysis of the semantic types
KEY CONCEPT Genetics provides a basis for new medical treatments.
Introduction To Medical Technology
KEY CONCEPT Genetics provides a basis for new medical treatments.
Physiological disorders and their care
The Descent of Hierarchy, and Selection in Relational Semantics*
KEY CONCEPT Genetics provides a basis for new medical treatments.
Classifying Semantic Relations in Bioscience Texts
KEY CONCEPT Genetics provides a basis for new medical treatments.
Marti Hearst Associate Professor SIMS, UC Berkeley
KEY CONCEPT Genetics provides a basis for new medical treatments.
Presentation transcript:

1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley

2 LINDI Project Goal: Extract semantics from text Method: statistical corpus analysis Focus: Biomedical text Rich lexical resources Semantic NLP problems Noun Compounds

3 Noun Compounds(NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel hand wash Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.

4 NCs: 3 computational tasks (Lauer & Dras ’94) Identification Syntactic analysis (attachments) [Baseline [headache frequency]] [[Tension headache] patient] Semantic analysis Headache treatment treatment for headache Corticosteroid treatment treatment that uses corticosteroid

5 Outline Classification schema for NC relations in the biomedical domain Experiments Supervised learning for classification of NC relations Examine generalization over lexical items using a lexical hierarchy Related work Conclusions

6 NC Semantic relations 38 Relations found by iterative refinement based on 2245 NCs Goals: More specific than case roles Allow for domain-specific relations

7 Semantic relations Frequency/time of influenza season, headache interval Measure of relief rate, asthma mortality, hospital survival Instrument aciclovir therapy, laser irradiation, aerosol treatment “Purpose” headache drugs, hiv medications, influenza treatment Defect hormone deficiency, csf fistulas, gene mutation Inhibitor Adrenoreceptor blockers, influenza prevention

8 Semantic relations Cause Asthma hospitalization, aids death Change Papilloma growth, disease development Activity/Physical Process Bile delivery, virus reproduction Person Afflicted Aids patients, headache group ….

9 Multi-class Assignment Some NCs can be describe by more than one semantic relationships eyelid abnormalities : location and defect food allergy:cause and activator cell growth:change and activity

10 NC Semantic Relations Linguistic theories regarding the nature of the relations between constituents in NCs all conflict. J. Levi ‘78 P. Downing ’77 B. Warren ‘78

11 Extraction of NCs 1. Titles and abstracts from Medline (medical bibliographic database) 2. Part-of-Speech Tagger 3. Extraction of sequences of units tagged as nouns 4. Collection of 2245 NCs with 2 nouns

12 Models Lexical (words) Class based model using MeSH descriptors

13 MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

14 MeSH Tree Structures 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A ] Inguinal Canal [A ] Peritoneum [A ] + Umbilicus [A ] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)

15 Mapping Nouns to MeSH Concepts headache recurrence C C headache pain C G breast cancer cells A C04 A11

16 Levels of Description headache pain MeSH 2: C.23 G.11 MeSH 3: C G MeSH 4: C G MeSH 5: C G MeSH 6: C G

17 Classification Task & Method Multi-class (18) classification problem Multi layer Neural Networks to classify across all relations simultaneously. Evaluation: distinguish between Seen: NCs where 1 or 2 words appeared in the training set Unseen: NCs in which neither word appeared in the training set

18 Accuracy for 18-way Classification Training 855 NCs (50%) Testing: 805 NCs (75 unseen) Correct answer in first two (71%-73%) Correct answer ranked first (61%-62%) Correct answer in first three (76%-78%) Logistic Regression (31%) Lexical MeSH Guessing (1/18 = 5%)

19 Accuracies for 18-way classification: generalization on unseen NCs Training: 73 NCs (5%) Testing: 1587 NCs (810 unseen) (95%) MeSH Lexical MeSH on unseen Lexical on unseen

20 Accuracy for each relation

21 Accuracy for sample relations Frequency/time of Test Set: disease recurrence headache recurrence enterovirus season influenza season mosquito season pollen season disease stage transcription stage drive time injection time ischemia time travel time

22 Accuracy for sample relations Produces (genetic) Ex. Test Set: thymidine allele tumor dna csf mrna acetylase gene virion rna (…)

23 Accuracy for sample relations Purpose Purpose Test Set: varicella vaccine influenza vaccination influenza immunization abscess drainage disease treatment asthma therapy Training Set: Instrument: antigen vaccine Object: vaccine development Subtype-of: opv vaccine

24 Related work (Noun Compound Relations) Finin (1980) Detailed AI analysis, hand-coded Rindflesch et al. (2000) Hand-coded rule base to extract certain types of assertions

25 Related work (Noun Compound Relations) Vanderwende (1994) automatically extracts semantic information from an on- line dictionary manipulates a set of handwritten rules 13 classes 52% accuracy Lapata (2000) classifies nominalizations into subject/object binary distinction 80% accuracy Lauer (1995): probabilistic model 8 classes 47% accuracy

26 Related work (Lexical Hierarchies) Prepositional Phrase Attachment Attachment, not semantics Binary choice Approaches Word occurrences (Hindle & Rooth ’93) Using a lexical hierarchy Conceptual association using a lexical hierarchy (Resnik ’93, Resnik & Hearst ’93) Transformation-based incorporating counts from a lexical hierarchy (Brill & Resnik ’94) MDL to find optimal tree cut (Li & Abe ’98) finds improvements over lexical

27 Conclusions A simple method for assigning semantic relations to noun compounds Does not require complex hand-coded rules Does make use of existing lexical resources Off-the-shelf ML algorithms High accuracy levels for an 18-way class assignment ~60% accuracy on mixed seen and unseen words ~40% accuracy on entirely unseen words on a tiny training set (73 NCs)

28 Future work Analysis of erroneous cases Other statistical models Bootstrapping & Active learning for labeling NCs with > 2 terms [[growth hormone] deficiency] (purpose + defect) Other syntactic structures Non-biomedical words Other ontologies (e.g.,WordNet)?

29 Relations

30

31 Accuracies by Unseen Noun Training: 73 NCs (5%) Testing: 1587 NCs (810 unseen) (95%) Case 1: first N unseen (424) Case 3: both N seen (810) Case 4: neither N seen (810) Case 2: second N unseen (252)

32 Using Relations Eventual plan: combine relations with constituents’ ontology memberships Examples Instrument_2 (biopsy,needle) -> Instrument_2(Diagnostic, Tool) Procedure(brain,biopsy) -> Procedure(Anatomical-Element, Diagnostic) Procedure(tumor, marker) -> Procedure(Disease-element, Indicator)

33 Levels of Description headache pain ( C G ) Only Tree: C G C (Diseases) G (Biological Sciences) Level 1 : C 23 G 11 C 23 (Diseases: Pathological Conditions) G 11 (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology) Level 2 : C G C (Diseases:Pathological Conditions: Signs and symptoms) G (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology:Nervous System Physiology) Level 3 : C G C (Diseases :Pathological Conditions: Signs and symptoms: Neurologic Manifestations) G (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology:Nervous System Physiology:Sensation)