Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510.

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Supervised Learning Recap
Alberto Trindade Tavares ECE/CS/ME Introduction to Artificial Neural Network and Fuzzy Systems.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
What is Statistical Modeling
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario Joint work with Marti Hearst SIMS, UC Berkeley.
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.
Distributed Representations of Sentences and Documents
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario SIMS UC Berkeley.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
7-Speech Recognition Speech Recognition Concepts
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
Text Classification, Active/Interactive learning.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Language Independent Method for Question Classification COLING 2004.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Classification Techniques: Bayesian Classification
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Labeling protein-protein interactions Barbara Rosario Marti Hearst Project overview The problem Identifying the interactions between proteins. Labeling.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Deep Learning for Bacteria Event Identification
Deep Learning Amin Sobhani.
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Category-Based Pseudowords
Statistical NLP: Lecture 9
Overview of Machine Learning
Generalization in deep learning
The Descent of Hierarchy, and Selection in Relational Semantics*
Classifying Semantic Relations in Bioscience Texts
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Statistical NLP : Lecture 9 Word Sense Disambiguation
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Marti Hearst Associate Professor SIMS, UC Berkeley
Presentation transcript:

Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from Genentech

Problem: Which relations hold between 2 entities? TreatmentDisease Cure? Prevent? Side Effect?

Hepatitis Examples Cure These results suggest that con A-induced hepatitis was ameliorated by pretreatment with TJ-135. Prevent A two-dose combined hepatitis A and B vaccine would facilitate immunization programs Vague Effect of interferon on hepatitis B

Two tasks Relationship Extraction: Identify the several semantic relations that can occur between the entities disease and treatment in bioscience text Entity extraction: Related problem: identify such entities

The Approach Data: MEDLINE abstracts and titles Graphical models Combine in one framework both relation and entity extraction Both static and dynamic models Simple discriminative approach: Neural network Lexical, syntactic and semantic features

Outline Related work Data and semantic relations Features Models and results Conclusions

Several DIFFERENT Relations between the Same Types of Entities Thus differs from the problem statement of other work on relations Many find one relation which holds between two entities (many based on ACE) Agichtein and Gravano (2000), lexical patterns for location of Zelenko et al. (2002) SVM for person affiliation and organization-location Hasegawa et al. (ACL 2004) Person- Organization -> President “relation” Craven (1999, 2001) HMM for subcellular- location and disorder-association Doesn’t identify the actual relation

Related work: Bioscience Many hand-built rules Feldman et al. (2002), Friedman et al. (2001) Pustejovsky et al. (2002) Saric et al.; this conference

Data and Relations MEDLINE, abstracts and titles 3662 sentences labeled Relevant: 1724 Irrelevant: 1771 e.g., “Patients were followed up for 6 months” 2 types of Entities, many instances treatment and disease 7 Relationships between these entities The labeled data is available at

Semantic Relationships 810: Cure Intravenous immune globulin for recurrent spontaneous abortion 616: Only Disease Social ties and susceptibility to the common cold 166: Only Treatment Flucticasone propionate is safe in recommended doses 63: Prevent Statins for prevention of stroke

Semantic Relationships 36: Vague Phenylbutazone and leukemia 29: Side Effect Malignant mesodermal mixed tumor of the uterus following irradiation 4: Does NOT cure Evidence for double resistance to permethrin and malathion in head lice

Features Word Part of speech Phrase constituent Orthographic features ‘is number’, ‘all letters are capitalized’, ‘first letter is capitalized’ … MeSH (semantic features) Replace words, or sequences of words, with generalizations via MeSH categories Peritoneum -> Abdomen

Features (cont.): MeSH MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

Features (cont.): MeSH 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A ] Inguinal Canal [A ] Peritoneum [A ] + Umbilicus [A ] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)

Models 2 static generative models 3 dynamic generative models 1 discriminative model (neural networks)

Static Graphical Models S1: observations dependent on Role but independent from Relation given roles S2: observations dependent on both Relation and Role S1S2

Dynamic Graphical Models D1, D2 as in S1, S2 D3: only one observation per state is dependent on both the relation and the role D1 D2 D3

Graphical Models Relation node: Semantic relation (cure, prevent, none..) expressed in the sentence

Graphical Models Role nodes: 3 choices: treatment, disease, or none

Graphical Models Feature nodes (observed): word, POS, MeSH…

Graphical Models Different dependencies between the features and the relation nodes D3 D1 S1 D2 S2

Graphical Models For Dynamic Model D1: Joint probability distribution over relation, roles and features nodes Parameters estimated with maximum likelihood and absolute discounting smoothing

Our D1 Thompson et al Frame classification and role labeling for FrameNet sentences Target word must be observed More relations and roles

Neural Networks Feed-forward network (MATLAB) Training with conjugate gradient descent One hidden layer (hyperbolic tangent function) Logistic sigmoid function for the output layer representing the relationships Same features Discriminative approach

Relation extraction Results in terms of classification accuracy (with and without irrelevant sentences) 2 cases: Roles hidden Roles given Graphical models NN: simple classification problem

Relation classification: Results Neural Net always best

Relation classification: Results With no smoothing, D1 best Graphical Model

Relation classification: Results With Smoothing and No Roles, D2 best GM

Relation classification: Results With Smoothing and Roles, D1 best GM

Relation classification: Results Dynamic models always outperform Static

Relation classification: Confusion Matrix Computed for the model D2, “rel + irrel.”, “only features”

Role extraction Results in terms of F-measure Graphical models Junction tree algorithm (BNT) Relation hidden and marginalized over NN Couldn’t run it (features vectors too large) (Graphical models can do role extraction and relationship classification simultaneously)

Role Extraction: Results F-measures D1 best when no smoothing

Role Extraction: Results F-measures D2 best with smoothing, but doesn’t boost scores as much as in relation classification

Features impact: Role Extraction Most important features: 1)Word, 2)MeSH Models D1 D2 All features No word % -14.1% No MeSH % -8.4% (rel. + irrel.)

Most important features: Roles Accuracy: D1 D2 NN All feat. + roles All feat. – roles % -8.7% -17.8% All feat. + roles – Word % -2.8% -0.5% All feat. + roles – MeSH % 3.1% 0.4% Features impact: Relation classification (rel. + irrel.)

Features impact: Relation classification Most realistic case: Roles not known Most important features: 1) Mesh 2) Word for D1 and NN (but vice versa for D2) Accuracy: D1 D2 NN All feat. – roles All feat. - roles – Word % -11.8% -4.3% All feat. - roles – MeSH % -3.2% -6.9% (rel. + irrel.)

Conclusions Classification of subtle semantic relations in bioscience text Discriminative model (neural network) achieves high classification accuracy Graphical models for the simultaneous extraction of entities and relationships Importance of lexical hierarchy Future work: A new collection of disease/treatment data Different entities/relations Unsupervised learning to discover relation types

Thank you! Barbara Rosario Marti Hearst SIMS, UC Berkeley

Additional slides

Smoothing: absolute discounting Lower the probability of seen events by subtracting a constant from their count (ML estimate: ) The remaining probability is evenly divided by the unseen events

F-measures for role extraction in function of smoothing factors

Relation accuracies in function of smoothing factors

Role Extraction: Results Static models better than Dynamic for Note: No Neural Networks

Features impact: Role Extraction Most important features: 1)Word, 2)MeSH Models D1 D2 Average All features No word % -14.1% -13.7% No MeSH % -8.4% -7.2% (rel. + irrel.)

Features impact: Role extraction Most important features: 1) Word, 2) MeSH F-measures: D1 D2 Average All features No word % -9.6% -9.6% No MeSH % -5.5% -4.8% (only rel.)

Features impact: Role extraction Most important features: 1) Word, 2) MeSH F-measures: D1 D2 All features No word % -9.6% No MeSH % -5.5% (only rel.)

Most important features: Roles Accuracy: D1 D2 NN Avg. All feat. + roles All feat. – roles % -8.7% -17.8% -17.1% All feat. + roles – Word % -2.8% -0.5% -1.1% All feat. + roles – MeSH % 3.1% 0.4% 1.1% Features impact: Relation classification (rel. + irrel.)

Features impact: Relation classification Most realistic case: Roles not known Most important features: 1) Mesh 2) Word for D1 and NN (but vice versa for D2) Accuracy: D1 D2 NN Avg. All feat. – roles All feat. - roles – Word % -11.8% -4.3% -6.4% All feat. - roles – MeSH % -3.2% -6.9% -6.4% (rel. + irrel.)