Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara 1), Yuji Matsumoto 1) 1) Nara Institute of Science and Technology, Japan 2) University of Massachusetts, Amherst, USA 3) NTT Communication Science Lab. Japan SMBM 2010 25 th - 26 th October, 2010 Hinxton, Cambridge, UK

2 Outline  Research summary  Related work of event extraction  Proposed coreference based approach  Experimental setup and highlighted data  Conclusion and future work

3 Summary of Our Research Coreference Based Approach for biomedical event extraction with Markov Logic Why coreference? –Extraction of valuable event-argument relations in discourse structure –Identification of arguments crossing sentence boundaries Why Markov Logic? –Implementation of Salience in Discourse and Transitivity in very direct fashion

4 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. CauseTheme S1 S2 S3 Arguments are often related to the other mentions through coreference relations Event-Argument Relation with Coreference Information

5 "this element" in S2 is coreferent to… "a regulatory element" in S1 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. Corefer TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. CauseTheme S1 S2 S3 Event-Argument Relation with Coreference Information

6 The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of it Transitivity enables us to extract it We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. (B) Corefer (C) Theme TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. CauseTheme (A) Theme S1 S2 S3 Event-Argument Relation with Coreference Information (A) Theme & (B) Corefer => (C) Theme

7 Arguments mentioned over and over again have higher salience in discourse and should be extracted at any cost Our approach can aggressively extracts such arguments that are valuable in discourse structure We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. Corefer Theme TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. CauseTheme Corefer Theme S1 S2 S3 Event-Argument Relation with Coreference Information

9 Biomedical Event Extraction (BioNLP'09 Task 1) Extracting events, arguments, and their relations in a document TPA induction increases the binding of AP-1 factors to this element. CauseTheme event argument Main targets : Event-Argument relations (E-As) argument Theme Example eventinduction, increases, binding argumentTPA, AP-1 factors, this element, induction, binding event-argumentTheme(induction-TPA), Cause(increases, induction), Theme(increases, binding), Theme(binding, AP-1 factors), Theme(binding, this element)

10 Previous Work [in BioNLP’09] Pairwise pipeline by SVM classifiers [Bjorne et al., 2009] eventarg1arg2 1.Identification of events 2.Coupling with proteins and labeling the roles eventarg1arg2 ＮoＮo Theme event1arg1arg2event2arg3 Theme Cause Collective approach by Markov Logic [Riedel et al., 2009] [Poon et al., 2010] 1.Jointly identify the most probable E-A assignments in a sentence

12 Markov Logic [Richardson and Domingos, 2006] A Statistical Relational Learning framework An expressive template language of Markov Networks Not only hard but also soft constraints A Markov Logic Network (MLN) is a set of pairs (φ, w) where –φ is a formula in first-order logic –w is a real number weight Higher weight  stronger constraint

13 Coreference Based Event Extraction with Markov Logic Hidden predicate (Query) predicatedescription event(i)token i is an event eventType(i,t)token i is an event with type t role(i,j,r)token i has an argument j with role r Observed predicate (Given) predicatedescription pos(i,p)token i has part-of-speech p protein(i)token i is a protein dep(i,j,d)token i depends on token j Features are described by combinations of these predicates

14 Example of Markov Logic Networks pos(3,Verb) event(3) w a (Verb) w b (regulation, Theme) role (3,6,Theme) protein(6) w c (obj,Theme) dep(3,6,obj) Weight FunctionWeight valueGround Formula w a (Verb)3.1 pos(3,Verb) ⇒ event(3) w b (regulation,Theme)-0.9 event(3) ^ eventType(3,regulation) ^ protein(6) ⇒ role(3,6,Theme) Feature definition by weighted First-Order Logic ※ all features are binary eventType(3,regulation) grounded grounding

15 Basic Ideas of Proposed Method Effective employment of coreference information based on discourse structure –Salience in Discourse : aggressive extraction of valuable E-As Consider event-argument relations crossing sentence boundaries –Transitivity involving coreference relations

16 How to Use Coreference with Markov Logic? 1.Salience in Discourse 2.Transitivity 3.Feature Copy Theme Cause Corefer Theme S1 S2 The IRF-2 promoter region contains a CpG island. The region is inducible by both interferons. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 predicatedescription corefer(i,j) token i is coreferent to token j

17 Coreference Based Approach ① （ Salience in Discourse ） Tokens coreferent to something have higher salience in discourse and are more likely to be arguments of events Theme Corefer S1 S2 The IRF-2 promoter region contains a CpG island. The region is inducible by both interferons. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 If "The region" is coreferent to "The IRF-2...", then there is at least one event related to "The region" ・・・（ SiD)

18 Coreference Based Approach ② （ Transitivity ） Transition rules involving coreference relations allow us to extract cross sentential event- arguments with "sentence by sentence" manner (A) Theme (B) Corefer (C) Theme S1 S2 The IRF-2 promoter region contains a CpG island. The region is inducible by both interferons. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 (A)(B)(C) ・・・（ T)

19 Coreference Based Approach ③ （ Feature Copy ） If a token coreferent to something, then we exploit the features of antecedents to identify intra sentential E-A relations Theme Corefer S1 S2 The IRF-2 promoter region contains a CpG island. The region is inducible by both interferons. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Copy ・・・ (FC)

21 Experimental Setup Data ： GENIA Event Corpus ver. 0.9 [Kim et al., 2008] –Preprocess : POS tagging, NE tagging, Parsing Coreference resolver ： pairwise model [Soon et al., 2001] –Learning & Inference ： SVM Event extraction: –Joint Markov Logic model [Riedel et al., 2009]  Learning : one-best MIRA  Inference : ILP solver with CPI [Riedel, 2008]  Provided by Markov thebeast –SVM pipeline [Bjorne et al., 2009]  Learning & Inference ： multi-class SVM

22 Experimental Result (Summary) Results of Event Extraction (F1) We got statistically significant improvements by both models, SVM and MLN ModelCoreferenceeventeventTyperole 1)1) SVM w/o77.067.852.3 ( 0.0) 2)2) with resolver77.067.853.6 (+1.3) 3)3) with gold77.067.855.4 (+3.1) 4)4) MLN w/o80.570.651.7 ( 0.0) 5)5) with resolver80.870.653.8 (+2.1) 6)6) with gold81.270.856.7 (+5.0) ρ< 0.01 (McNemar’s test, 2-tailed)

23 Three Types of E-A Relations (2) W-ANT (3) Normal Corefer (1) Cross S1 S2 The IRF-2 promoter region contains a CpG island. The region is inducible by both interferons. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 TypeDescription (1) Cross linkE-A relations crossing sentence boundaries (2) With-AntecedentIntra-sentence E-As with antecedents (3) NormalNeither Cross-link nor With-Antecedent Evaluation for the three types of E-A relations

24 Experimental Result (E-A Relation) Results of E-A Relation Extraction (F1) Both Transitivity and Salience in Discourse work well MLN with gold coreference annotations outperforms SVM pipeline both on Cross and on W-ANT ModelCoreferenceCross-linkWith-AntecedentNormal 1)1) SVM w/o0.056.053.6 2)2) with resolver27.957.054.3 3)3) with gold54.157.355.4 4)4) MLN w/o0.049.8 ( 0.0)53.2 5)5) with resolver39.356.5 (+6.7)54.3 6)6) with gold69.766.7(+16.9)55.3

26 Summary We proposed a new method for biomedical event extraction with coreference information Our systems successfully extract cross- sentential E-As by transitivity including coreference relations The concept of salience in discourse can also help E-A extraction We got further improvements with gold coreference annotations especially for MLN

27 Future Work Make more effort to coreference resolution –From pairwise model to clustering approach Full joint approach of event extraction and coreference resolution –Fighting against computational costs –Narrative Event Chains [Chambers et al., 2008]

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

Similar presentations

Presentation on theme: "Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

Similar presentations

Presentation on theme: "Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara."— Presentation transcript:

Similar presentations

About project

Feedback