I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Alberto Trindade Tavares ECE/CS/ME Introduction to Artificial Neural Network and Fuzzy Systems.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
What is Statistical Modeling
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario Joint work with Marti Hearst SIMS, UC Berkeley.
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
Presented by Zeehasham Rasheed
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.
Distributed Representations of Sentences and Documents
Next Steps in Literature Mining Marti Hearst UC Berkeley ASIST 2003 Literature Mining Panel.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario SIMS UC Berkeley.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
7-Speech Recognition Speech Recognition Concepts
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Language Independent Method for Question Classification COLING 2004.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Some questions -What is metadata? -Data about data.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
NTU & MSRA Ming-Feng Tsai
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Deep Learning for Bacteria Event Identification
PRESENTED BY: PEAR A BHUIYAN
Natural Language Processing (NLP)
Improving a Pipeline Architecture for Shallow Discourse Parsing
Automatic Detection of Causal Relations for Question Answering
CS4705 Natural Language Processing
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
The Descent of Hierarchy, and Selection in Relational Semantics*
Natural Language Processing (NLP)
Classifying Semantic Relations in Bioscience Texts
By Hossein Hematialam and Wlodek Zadrozny Presented by
Natural Language Processing (NLP)
Presentation transcript:

I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario

2 Today Midterm evaluations Discuss schedule for next class Finish slides lecture 13 Information Extraction (2)

3 Text Mining/Information Extraction Text: –Stress is associated with migraines –Stress can lead to loss of magnesium –Calcium channel blockers prevent some migraines –Magnesium is a natural calcium channel blocker 1: Extract semantic entities from text

4 Text Mining Text: –Stress is associated with migraines –Stress can lead to loss of magnesium –Calcium channel blockers prevent some migraines –Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 1: Extract semantic entities from text

5 Text Mining (cont.) Text: –Stress is associated with migraines –Stress can lead to loss of magnesium –Calcium channel blockers prevent some migraines –Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 2: Classify relations between entities Associated with Lead to lossPrevent Subtype-of (is a)

6 Text Mining (cont.) Text: –Stress is associated with migraines –Stress can lead to loss of magnesium –Calcium channel blockers prevent some migraines –Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 3: Do reasoning: find new correlations Associated with Lead to loss Prevent Subtype-of (is a)

7 Text Mining (cont.) Text: –Stress is associated with migraines –Stress can lead to loss of magnesium –Calcium channel blockers prevent some migraines –Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 4: Do reasoning: infer causality Associated with Lead to loss Prevent Subtype-of (is a) No prevention Deficiency of magnesium  migraine

8 Death Receptors Signaling Survival Factors Signaling Ca ++ Signaling P53 pathway Caspase 12 Effecter Caspases (3,6,7) Caspase 9 Apaf 1 IAPs NFkB Mitochondria Cytochrome c Bax, Bak Apoptosis Bcl-2 like BH3 only Apoptosis Network Smac ER Stress Genotoxic Stress Initiator Caspases (8, 10) AIF Lost of Attachment Cell Cycle stress, etc

9 Project at UC Berkeley The network nodes are deduced from reading and processing of experimental knowledge by experts. Every month >1000 apoptosis papers are published. We need to keep track of ALL the information in order to understand the system better. Ultimate Goal: Produce models capable to predict system behavior, identify critical control points and propose critical experiments to extend current knowledge Project: Develop an automatic literature analysis tool –Zhang and Arkin, UC Berkeley

10 To convert free text into structured format (manually!)

11 To convert free text into structured format (manually!)

Problem: Which relations hold between 2 entities? TreatmentDisease Cure? Prevent? Side Effect?

13 Hepatitis Examples Cure –These results suggest that con A-induced hepatitis was ameliorated by pretreatment with TJ-135. Prevent –A two-dose combined hepatitis A and B vaccine would facilitate immunization programs Vague –Effect of interferon on hepatitis B

14 Two tasks Relationship Extraction: –Identify the several semantic relations that can occur between the entities disease and treatment in bioscience text Often the different relationships are determined by the entities involved –Location of: LOCATION and ORGANIZATION Here different relations between the same entities Entity extraction: –Related problem: identify such entities

15 The Approach Data: MEDLINE abstracts and titles Graphical models –Combine in one framework both relation and entity extraction –Both static and dynamic models Simple discriminative approach: –Neural network Lexical, syntactic and semantic features

16 Several DIFFERENT Relations between the Same Types of Entities Thus differs from the problem statement of other work on relations Many find one relation which holds between two entities (many based on ACE) –Agichtein and Gravano (2000), lexical patterns for location of –Zelenko et al. (2002) SVM for person affiliation and organization-location –Hasegawa et al. (ACL 2004) Person-Organization -> President “relation” –Craven (1999, 2001) HMM for subcellular-location and disorder-association Doesn’t identify the actual relation

17 Related work: Bioscience Many hand-built rules –Feldman et al. (2002), –Friedman et al. (2001) – Pustejovsky et al. (2002) –Saric et al.; this conference

18 Data and Relations MEDLINE, abstracts and titles 3662 sentences labeled –Relevant: 1724 –Irrelevant: 1771 e.g., “Patients were followed up for 6 months” 2 types of Entities, many instances –treatment and disease 7 Relationships between these entities The labeled data is available at

19 Labeling

20 Inter-annotators agreement F-measures between the 2 annotations was 81% (an upper limit for the system performance)

21 Annotators’ disagreement

22 Semantic Relationships 810: Cure –Intravenous immune globulin for recurrent spontaneous abortion 616: Only Disease –Social ties and susceptibility to the common cold 166: Only Treatment –Flucticasone propionate is safe in recommended doses 63: Prevent –Statins for prevention of stroke

23 Semantic Relationships 36: Vague –Phenylbutazone and leukemia 29: Side Effect –Malignant mesodermal mixed tumor of the uterus following irradiation 4: Does NOT cure –Evidence for double resistance to permethrin and malathion in head lice

24 Preprocessing Sentence splitter Tokenizer (Penn tree bank) Brill’s POS Collins parser

25 Preprocessing

26 Preprocessing Chunking Semantic tagging with MeSH: map the words into MeSH terms.

27 MeSH MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

28 MeSH 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] ( …..) Body Regions [A01] –Abdomen [A01.047] Groin [A ] Inguinal Canal [A ] Peritoneum [A ] + Umbilicus [A ] –Axilla [A01.133] –Back [A01.176] + –Breast [A01.236] + –Buttocks [A01.258] –Extremities [A01.378] + –Head [A01.456] + –Neck [A01.598] –( ….)

29 Preprocessing

30 Features Word Part of speech Phrase constituent (the phrase type from the shallow parse: NP< PP..) Belongs to the same chunk as previous word? Orthographic features –Is number? –Only part is number –Is negation –First letter is capital –All capital letters –All non word character –Contains non word character MeSH (semantic features) Extract these features with Python

31 Models 2 static generative models 3 dynamic generative models –Smoothing: absolute discounting 1 discriminative model (neural networks) These models in Matlab

32 Architecture Get text (Python) Annotate Preprocessing, extract features and transform features into numbers (Python) Algorithms (Matlab) Process output (Python) numerical features prediction

33 Static Graphical Models –S1: observations dependent on Role but independent from Relation given roles –S2: observations dependent on both Relation and Role S1S2

34 Dynamic Graphical Models D1, D2 as in S1, S2 D3: only one observation per state is dependent on both the relation and the role D1 D2 D3

35 Graphical Models Relation node: –Semantic relation (cure, prevent, none..) expressed in the sentence

36 Graphical Models Role nodes: –3 choices: treatment, disease, or none

37 Graphical Models Feature nodes (observed): –word, POS, MeSH…

38 Graphical Models Different dependencies between the features and the relation nodes D3 D1 S1 D2 S2

39 Graphical Models For Dynamic Model D1: –Joint probability distribution over relation, roles and features nodes –Parameters estimated with maximum likelihood and absolute discounting smoothing

40 Neural Networks Feed-forward network (MATLAB) –Training with conjugate gradient descent –One hidden layer (hyperbolic tangent function) –Logistic sigmoid function for the output layer representing the relationships Same features Discriminative approach

41 Relation extraction Results in terms of classification accuracy (with irrelevant sentences) 2 cases: –Roles hidden –Roles given Graphical models NN: simple classification problem

42 Relation classification: Results InputBaseStaticDynamicNN S1S2D1D2D3 only feature s feature s + roles Accuracies on the relation classification task

43 Relation classification: Confusion Matrix Computed for the model D2, “only features”

44 Role extraction Results in terms of F-measure Graphical models –Junction tree algorithm (BNT) –Relation hidden and marginalized over NN –Couldn’t run it (features vectors too large) (Graphical models can do role extraction and relationship classification simultaneously)

45 Evaluation POS: Possible. The total number of truth values. POS = COR + INC + MIS ACT: Actual. The total number of predictions. ACT = COR + INC + SPU REC: Recall. A measure of how many of the truth values were produced: REC = COR / POS PRE: Precision. A measure of how many of the predictions are actually in the truth: PRE = COR / ACT

46 Evaluation Alignment: PredictionTrue value From these we derive F-measures

47 Role Extraction: Results StaticDynamic S1S2D1D2D F-measures

48 Features impact: Role Extraction Most important features: 1)Word, 2)MeSH Models D1 D2 All features No word % -14.1% No MeSH % -8.4%

49 Most important features: Roles Accuracy: D1 D2 NN All feat. + roles All feat. – roles % -8.7% -17.8% All feat. + roles – Word % -2.8% -0.5% All feat. + roles – MeSH % 3.1% 0.4% Features impact: Relation classification

50 Features impact: Relation classification Most realistic case: Roles not known Most important features: 1) Mesh 2) Word for D1 and NN (but vice versa for D2) Accuracy: D1 D2 NN All feat. – roles All feat. - roles – Word % -11.8% -4.3% All feat. - roles – MeSH % -3.2% -6.9%

51 Conclusions Classification of subtle semantic relations in bioscience text –Discriminative model (neural network) achieves high classification accuracy –Graphical models for the simultaneous extraction of entities and relationships –Importance of lexical hierarchy