Download presentation
Presentation is loading. Please wait.
1
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley
2
2 LINDI Project Goal: Extract semantics from text Method: statistical corpus analysis Focus: Biomedical text Rich lexical resources Semantic NLP problems Noun Compounds
3
3 Noun Compounds(NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel hand wash Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.
4
4 NCs: 3 computational tasks (Lauer & Dras ’94) Identification Syntactic analysis (attachments) [Baseline [headache frequency]] [[Tension headache] patient] Semantic analysis Headache treatment treatment for headache Corticosteroid treatment treatment that uses corticosteroid
5
5 Outline Classification schema for NC relations in the biomedical domain Experiments Supervised learning for classification of NC relations Examine generalization over lexical items using a lexical hierarchy Related work Conclusions
6
6 NC Semantic relations 38 Relations found by iterative refinement based on 2245 NCs Goals: More specific than case roles Allow for domain-specific relations
7
7 Semantic relations Frequency/time of influenza season, headache interval Measure of relief rate, asthma mortality, hospital survival Instrument aciclovir therapy, laser irradiation, aerosol treatment “Purpose” headache drugs, hiv medications, influenza treatment Defect hormone deficiency, csf fistulas, gene mutation Inhibitor Adrenoreceptor blockers, influenza prevention
8
8 Semantic relations Cause Asthma hospitalization, aids death Change Papilloma growth, disease development Activity/Physical Process Bile delivery, virus reproduction Person Afflicted Aids patients, headache group ….
9
9 Multi-class Assignment Some NCs can be describe by more than one semantic relationships eyelid abnormalities : location and defect food allergy:cause and activator cell growth:change and activity
10
10 NC Semantic Relations Linguistic theories regarding the nature of the relations between constituents in NCs all conflict. J. Levi ‘78 P. Downing ’77 B. Warren ‘78
11
11 Extraction of NCs 1. Titles and abstracts from Medline (medical bibliographic database) 2. Part-of-Speech Tagger 3. Extraction of sequences of units tagged as nouns 4. Collection of 2245 NCs with 2 nouns
12
12 Models Lexical (words) Class based model using MeSH descriptors
13
13 MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]
14
14 MeSH Tree Structures 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)
15
15 Mapping Nouns to MeSH Concepts headache recurrence C23.888.592.612.441 C23.550.291.937 headache pain C23.888.592.612.441 G11.561.796.444 breast cancer cells A01.236 C04 A11
16
16 Levels of Description headache pain MeSH 2: C.23 G.11 MeSH 3: C23.888 G11.561 MeSH 4: C23.888.592 G11.561.796 MeSH 5: C23.888.592 G11.561.796 MeSH 6: C23.888.592.612 G11.561.796.444
17
17 Classification Task & Method Multi-class (18) classification problem Multi layer Neural Networks to classify across all relations simultaneously. Evaluation: distinguish between Seen: NCs where 1 or 2 words appeared in the training set Unseen: NCs in which neither word appeared in the training set
18
18 Accuracy for 18-way Classification Training 855 NCs (50%) Testing: 805 NCs (75 unseen) Correct answer in first two (71%-73%) Correct answer ranked first (61%-62%) Correct answer in first three (76%-78%) Logistic Regression (31%) Lexical MeSH Guessing (1/18 = 5%)
19
19 Accuracies for 18-way classification: generalization on unseen NCs Training: 73 NCs (5%) Testing: 1587 NCs (810 unseen) (95%) MeSH Lexical MeSH on unseen Lexical on unseen
20
20 Accuracy for each relation
21
21 Accuracy for sample relations Frequency/time of Test Set: disease recurrence headache recurrence enterovirus season influenza season mosquito season pollen season disease stage transcription stage drive time injection time ischemia time travel time
22
22 Accuracy for sample relations Produces (genetic) Ex. Test Set: thymidine allele tumor dna csf mrna acetylase gene virion rna (…)
23
23 Accuracy for sample relations Purpose Purpose Test Set: varicella vaccine influenza vaccination influenza immunization abscess drainage disease treatment asthma therapy Training Set: Instrument: antigen vaccine Object: vaccine development Subtype-of: opv vaccine
24
24 Related work (Noun Compound Relations) Finin (1980) Detailed AI analysis, hand-coded Rindflesch et al. (2000) Hand-coded rule base to extract certain types of assertions
25
25 Related work (Noun Compound Relations) Vanderwende (1994) automatically extracts semantic information from an on- line dictionary manipulates a set of handwritten rules 13 classes 52% accuracy Lapata (2000) classifies nominalizations into subject/object binary distinction 80% accuracy Lauer (1995): probabilistic model 8 classes 47% accuracy
26
26 Related work (Lexical Hierarchies) Prepositional Phrase Attachment Attachment, not semantics Binary choice Approaches Word occurrences (Hindle & Rooth ’93) Using a lexical hierarchy Conceptual association using a lexical hierarchy (Resnik ’93, Resnik & Hearst ’93) Transformation-based incorporating counts from a lexical hierarchy (Brill & Resnik ’94) MDL to find optimal tree cut (Li & Abe ’98) finds improvements over lexical
27
27 Conclusions A simple method for assigning semantic relations to noun compounds Does not require complex hand-coded rules Does make use of existing lexical resources Off-the-shelf ML algorithms High accuracy levels for an 18-way class assignment ~60% accuracy on mixed seen and unseen words ~40% accuracy on entirely unseen words on a tiny training set (73 NCs)
28
28 Future work Analysis of erroneous cases Other statistical models Bootstrapping & Active learning for labeling NCs with > 2 terms [[growth hormone] deficiency] (purpose + defect) Other syntactic structures Non-biomedical words Other ontologies (e.g.,WordNet)?
29
29 Relations
30
30
31
31 Accuracies by Unseen Noun Training: 73 NCs (5%) Testing: 1587 NCs (810 unseen) (95%) Case 1: first N unseen (424) Case 3: both N seen (810) Case 4: neither N seen (810) Case 2: second N unseen (252)
32
32 Using Relations Eventual plan: combine relations with constituents’ ontology memberships Examples Instrument_2 (biopsy,needle) -> Instrument_2(Diagnostic, Tool) Procedure(brain,biopsy) -> Procedure(Anatomical-Element, Diagnostic) Procedure(tumor, marker) -> Procedure(Disease-element, Indicator)
33
33 Levels of Description headache pain ( C23.888.592.612.441 G11.561.796.444) Only Tree: C G C (Diseases) G (Biological Sciences) Level 1 : C 23 G 11 C 23 (Diseases: Pathological Conditions) G 11 (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology) Level 2 : C 23 888 G 11 561 C 23.888 (Diseases:Pathological Conditions: Signs and symptoms) G 11.561 (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology:Nervous System Physiology) Level 3 : C 23 888 592 G 11 561 796 C 23.888.592 (Diseases :Pathological Conditions: Signs and symptoms: Neurologic Manifestations) G 11.561.796 (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology:Nervous System Physiology:Sensation)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.