The Descent of Hierarchy, and Selection in Relational Semantics*

Slides:



Advertisements
Similar presentations
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Advertisements

New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
6/23/03 IndoUS DL 2003 Text Metadata Mining: Exploring its potential* Padmini Srinivasan School of Library & Information Science The University of Iowa.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
The NLM Controlled Vocabulary Medical Subject Headings (MeSH) PubMed for Trainers, Spring 2015 U.S. National Library of Medicine (NLM) and NLM Training.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
Faculty Of Applied Science Simon Fraser University Cmpt 825 presentation Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary Jiri.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
UCB BioText TREC 2003 Genomics Track Participants: Marti Hearst Gaurav Bhalotia, Preslav Nakov, Ariel Schwartz University of California, Berkeley Genomics:
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
What Do Toxicologists Do?
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI , ARDA.
1 The BioText Project Myers Seminar Sept 22, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI , ARDA AQUAINT,
Chiropractic Care A Drug-free, Non-surgical Approach to Health Care.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
1 MP/H Coding Rules General Instructions MP/H Task Force Multiple Primary Rules Histology Coding Rules 2007.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Chapter 1 “The Science of Biology” The goal of science is to investigate and understand, to explain events in nature, and to use those explanations to.
IDA2: Intelligent Discovery of Acronyms and Abbreviations Adam Mallen under the advisement of Dr. Craig Struble and Dr. Lenwood Heath.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Health Research in Thailand: A Gap Analysis Krit Pongpirul, MD. International Health Policy Program (IHPP-Thailand)
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
An Overview of Statistics Section 1.1 After you see the slides for each section, do the Try It Yourself problems in your text for that section to see if.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Medical Subject Headings (MeSH)
Study on the Design for Consumer Health Knowledge Organization in China Institute of Medical Information Chinese Academy of Medical Sciences Jul. 10th,
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
Clinical Department of Psychiatry University of Michigan Medical School Ann Arbor, June 28, 2002 Why Medicine Should be an Information Science Bruce R.
MeSH: Medical Subject Headings Anne Allen, Heather Braum, Paula Davidson, Ellen Rose LI 804: Organization of Information.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Topic 2: Types of Statistical Studies
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
NeurOn: Modeling Ontology for Neurosurgery
School of Computer Science & Engineering
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Fenglong Ma1, Jing Gao1, Qiuling Suo1
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Biomedical Research.
Category-Based Pseudowords
Introduction to Statistics
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Chapter 1 “The Science of Biology”
Marti Hearst Associate Professor SIMS, UC Berkeley
Presentation transcript:

The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles Darwin

Noun Compounds (NCs) Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment. Any sequence of nouns that itself functions as a noun asthma hospitalizations health care personnel hand wash

NCs: 3 computational tasks Identification Syntactic analysis (attachments) [Baseline [headache frequency]] [[Tension headache] patient] Our Goal: Semantic analysis Headache treatment  treatment for headache Corticosteroid treatment  treatment that uses corticosteroid

Descent of Hierarchy Idea: Hypothesis: Use the top levels of a lexical hierarchy to identify semantic relations Hypothesis: A particular semantic relation holds between all 2-word NCs that can be categorized by a lexical category pair.

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations Method and results Discussion of ambiguity

Related work (Semantic analysis of NCs) Rule-based Finin (1980) Detailed AI analysis, hand-coded Vanderwende (1994) automatically extracts semantic information from an on-line dictionary, manipulates a set of handwritten rules. 13 classes, 52% accuracy Probabilistic Lauer (1995): probabilistic model, 8 classes, 47% accuracy Lapata (2000) classifies nominalizations into subject/object. 2 classes, 80% accuracy

Related work (Semantic analysis of NCs) Lexical Hierarchy Barrett et al. (2001) WordNet, heuristics to classify a NC given the similarity to a known NC Rosario and Hearst (2001) MeSH, Neural Network. 18 classes, 60% accuracy Relations pre-defined

Linguistic Motivation Semantics of the NCs: head-modifier relationship Head noun has argument structure Meaning of the head noun determines what kinds of things can be done to it, what it is made of, what it is a part of…

Linguistic Motivation (cont.) Material + Cutlery  Made of steel knife, plastic fork, wooden spoon   Food + Cutlery  Used on meat knife, dessert spoon, salad fork  Profession + Cutlery  Used by chef's knife, butcher's knife

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations Method and results Discussion of ambiguity

The lexical Hierarchy: MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

The lexical Hierarchy: MeSH 1. Anatomy [A] Body Regions [A01] 2. [B] Musculoskeletal System [A02] 3. [C] Digestive System [A03] 4. [D] Respiratory System [A04] 5. [E] Urogenital System [A05] 6. [F] …… 7. [G] 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics 9. [I] Astronomy 10. [J] Nature 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference Standard

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference Standard Homogeneous Heterogeneous

Mapping Nouns to MeSH Concepts headache recurrence C23.888.592.612.441 C23.550.291.937 headache pain C23.888.592.612.441 G11.561.796.444 breast cancer cells A01.236 C04 A11

Levels of Description headache pain Level 0: C.23 G.11 … Original: C23.888.592.612.441 G11.561.796.444

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations Method and results Discussion of ambiguity

Descent of Hierarchy Idea: Hypothesis: Words falling in homogeneous MeSH subhierarchies behave “similarly” with respect to relation assignment Hypothesis: A particular semantic relation holds between all 2-word NCs that can be categorized by a MeSH category pairs

Grouping the NCs CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull osteosarcoma… CP: C04 M01 (Neoplasms, Person) leukemia survivor, lymphoma patients, cancer physician, cancer nurses…

Distribution of Category Pairs

Collection ~70,000 NCs extracted from titles and abstracts of Medline 2,627 CPs at level 0 (with at least 10 unique NCs) We analyzed 250 CPs with Anatomy (A) 21 CPs with Natural Science (H01) 3 CPs with Neoplasm (C04) This represents 10% of total CPs and 20% of total NCs

Classification Method For each CP Divide its NCs into “training-testing” sets “Training”: inspect NCs by hand Start from level 0 0 While NCs are not all similar descend one level of the hierarchy Repeat until all NCs for that CP are similar

Using the CPs for classification CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull osteosarcoma

Using the CPs for classification CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull osteosarcoma Similar NCs All NCs under the CP A02 C04 have the same semantic relationship Location of disease? Disease in Anatomy?

Using the CPs for classification CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull osteosarcoma Similar NCs All NCs under the CP A02 C04 have the same semantic relationship Location of disease? Disease in Anatomy? Add CP: A02 C04 to the list of classification decisions Classification decisions A02 C04

Using the CPs for classification CP: B06 B06 (Plants, Plants) eucalyptus trees, apple fruits, rice grains, potato plants Classification decisions A02 C04

Using the CPs for classification CP: B06 B06 (Plants, Plants) eucalyptus trees, apple fruits, rice grains, potato plants Similar Same relationship Add CP B06 B06 Classification decisions A02 C04 B06 B06

Using the CPs for classification CP: C04 M01 (Neoplasms, Person) leukemia survivor, lymphoma patients, cancer physician, cancer nurses… Person afflicted by Disease? Person who treat Disease? Too different! Second noun needs to be more specific: Descend one level for the second noun Person Classification decisions A02 C04 B06 B06

Using the CPs for classification CP: C04 M01 (Neoplasm, Person) leukemia survivor, lymphoma patients, cancer physician, cancer nurses…  Too different! CP: C04 M01.643 (Neoplasms, Patients) leukemia survivor, lymphoma patients Person afflicted by Disease CP: C04 M01.526 (Neoplasms, Occupational Groups) cancer physician, cancer nurses… Person who treat Disease OK Classification decisions A02 C04 B06 B06 C04 M01  C04 M01.643 C04 M01.526

Classification Decisions A02 C04 B06 B06 C04 M01 C04 M01.643 C04 M01.526 A01 H01 A01 H01.770 A01 H01.671 A01 H01.671.538 A01 H01.671.868 A01 M01 A01 M01.643 A01 M01.526 A01 M01.898

Classification Decisions + Relations (future work) A02 C04  Location of Disease B06 B06  Kind of Plants C04 M01 C04 M01.643  Person afflicted by Disease C04 M01.526  Person who treats Disease A01 H01 A01 H01.770 A01 H01.671 A01 H01.671.538 A01 H01.671.868 A01 M01 A01 M01.643 A01 M01.526 A01 M01.898

Classification Decisions + Relations (future work) A02 C04  Location of Disease B06 B06  Kind of Plants C04 M01 C04 M01.643  Person afflicted by Disease C04 M01.526  Person who treats Disease A01 H01 A01 H01.770 A01 H01.671 A01 H01.671.538 A01 H01.671.868 A01 M01 A01 M01.643  Person afflicted by Disease A01 M01.526 A01 M01.898

Classification Decision Levels Anatomy: 250 CPs 187 (75%) remain first level 56 (22%) descend one level 7 (3%) descend two levels Natural Science (H01): 21 CPs 1 (4%) remain first level 8 (39%) descend one level 12 (57%) descend two levels Neoplasms (C04) 3 CPs: 3 (100%) descend one level

Evaluation Test the decisions on “testing” set Count how many NCs that fall in the groups defined in the classification decisions are similar to each other Accuracy: Anatomy: 91% accurate Natural Science: 79% Neoplasm: 100% Total Accuracy : 90.8% Generalization: our 415 classification decisions cover ~ 46,000 possible CP pairs

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations Method and results Discussion of ambiguity

Ambiguity – Two Types Lexical ambiguity: Relationship ambiguity: mortality state of being mortal death rate Relationship ambiguity: bacteria mortality death of bacteria death caused by bacteria

Lexical Ambiguity vs. Multiple MeSH Senses Lexical ambiguity different from multiple MeSH senses Ex: Mortality has 4 senses Public Health (G)  Data Collection  Vital Statistics   Mortality Investigative Techniques (E)  Data Collection  Vital Statistics   Mortality Information Science (L)  Data Collection  Vital Statistics   Mortality Population Characteristics (N)  Demography  Vital Statistics   Mortality On average, there are 1.5 MeSH senses per word for the nouns in our collection

Four Cases Single MeSH senses Multiple MeSH senses Ambiguity of Only one possible relationship: abdomen radiography, aciclovir treatment Only one possible relationship: alcoholism treatment Multiple relationships: hospital databases, education efforts, kidney metabolism Multiple relationships bacteria mortality Ambiguity of relationship

Four Cases Single MeSH senses Multiple MeSH senses Only one possible relationship: abdomen radiography, aciclovir treatment Only one possible relationship: alcoholism treatment Multiple relationships: hospital databases, education efforts, kidney metabolism Multiple relationships bacteria mortality Most problematic cases Ambiguity of relationship … but rare!

Conclusions Very simple method for assigning semantic relations to two-word technical NCs 90.8% accuracy Grouping the NCs with respect to their semantic descriptors Lexical resource (MeSH) useful for this task Use the upper levels of the lexical hierarchy for an accurate classification, reducing therefore the space of the problem

Future work Analyze full spectrum of hierarchy NCs with > 2 terms [[growth hormone] deficiency] Other syntactic structures Non-biomedical words Other ontologies (e.g.,WordNet)?

And given enough data… skull character jaw depression nose resuscitation cadaver motion

Thanks! For more information: http://bailando.sims.berkeley.edu/lindi/