Jointly Identifying Temporal Relations with Markov Logic Katsumasa Yoshikawa †, Sebastian Riedel ‡, Masayuki Asahara †, Yuji Matsumoto † † Nara Institute.

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Temporal Ordering of Events in the News Domain Preethi Raghavan.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
A Structured Model for Joint Learning of Argument Roles and Predicate Senses Yotaro Watanabe Masayuki Asahara Yuji Matsumoto ACL 2010 Uppsala, Sweden July.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University.
Scalable Text Mining with Sparse Generative Models
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Introduction to Machine Learning Approach Lecture 5.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Incorporating Extra-linguistic Information into Reference Resolution in Collaborative Task Dialogue Ryu Iida Shumpei Kobayashi Takenobu Tokunaga Tokyo.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.
Automatically Labeled Data Generation for Large Scale Event Extraction
Lecture 7: Constrained Conditional Models
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
A Brief Introduction to Distant Supervision
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Relation Extraction CSCI-GA.2591
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Improving a Pipeline Architecture for Shallow Discourse Parsing
A Structured Learning Approach to Temporal Relation Extraction
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Dan Roth Department of Computer Science
Stance Classification of Ideological Debates
Presentation transcript:

Jointly Identifying Temporal Relations with Markov Logic Katsumasa Yoshikawa †, Sebastian Riedel ‡, Masayuki Asahara †, Yuji Matsumoto † † Nara Institute of Science and Technology, Japan ‡ University of Massachusetts, Amherst ACL-IJCNLP 2-7 August, 2009 Suntec Singapore

2 Outline  Background and Motivation  Related work of temporal relation identification  Proposed global approach with Markov Logic  Experimental setup and highlighted data  Summary and future work

3 Background and Motivation  Temporal Relation Identification (temporal ordering) Identifying temporal orders of events and time expressions in a document introduction PresentPastFuture Document Creation Time (August 2009) became BEFORE 2003 Essential work for document understanding With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible.

4 Outline  Background and Motivation  Related work of temporal relation identification  Proposed global approach with Markov Logic  Experimental setup and highlighted data  Summary and future work

5 EVENT / TIME AFTER IAFTER ENDS DURING BEGUN_BY SIMULTANEOUS BEGINS ENDED_BY IBEFORE BEFORE > mi oi f d si = s c fi o m < after met-by overlapped-by finishes during started-by equal starts contains finished-by overlaps meets before TimeML (11 Labels)‏ Allen’s (13 Labels)‏ INCLUDES Allen‘s Temporal Logic [Allen 1983] TimeML and TimeBank [Pustejovsky et al. 2003]  We regard temporal ordering as a classification task  With TimeML, the TimeBank corpus was created

6 TempEval (SemEval 2007 Task 15)  Temporal Relation Identification in SemEval 2007 Shared Task (TempEval)  Six temporal relation labels Main Label (BEFORE, AFTER , OVERLAP) Sub-Label (BEFORE-OR-OVERLAP, OVERLAP-OR-AFTER, VAGUE)  TempEval includes three types of tasks (A, B, and C)

7 introduction DCT (August 2009) became 2003 OVERLAP Task A of TempEval  Temporal relations between events and time expressions that occur within the same sentence PastFuturePresent With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible.

8 introduction DCT (August 2009) became 2003 Task B of TempEval  Temporal relations between events and the Document Creation Time (DCT) BEFORE PastFuturePresent With the introduction of the TimeBank corpus (Pustejovsky et al., 2003), machine learning approaches to temporal ordering became possible.

9 BEFORE created DCT (August 2009) became 2003 Task C of TempEval  Temporal relations between the main events of adjacent sentences PastFuturePresent The TimeBank corpus was created (Pustejovsky et al., 2003). As a result, machine learning approaches to temporal ordering became possible.

10 Issues of the TempEval Participants  Local approaches with machine learning are employed by many participants in TempEval  Considering only a single relation at a time  Local approach cannot take into account the other relations A global approach can be useful in that case EVENT 1 EVENT 2‏ BEFORE (Task C)AFTER ? (Task C) DCT‏ EVENT 1 BEFORE (Task B) EVENT 2‏ AFTER (Task B) DCT‏

11 A global approach can be useful in that case Issues of the TempEval Participants  Local approaches with machine learning are employed by many participants in TempEval  Considering only a single relation at a time  Local approach cannot take into account the other relations EVENT 1 EVENT 2‏ BEFORE (Task C) DCT‏ BEFORE (Task B) AFTER (Task B)

12 Outline  Background and Motivation  Related work and task reviews of temporal relation identification  Proposed global approach with Markov Logic  Experimental setup and highlighted data  Summary and future work

13 Overview of Our Global Approach  Ensure consistency among the multiple relations with hard and soft constraints based on the transition rules  Jointly identify the three types of relations in TempEval Learning one global model for the three tasks Global approach with Markov Logic

14 Markov Logic [Richardson and Domingos, 2006]  A Statistical Relational Learning framework  An expressive template language of Markov Networks  Not only hard but also soft constraints  A Markov Logic Network (MLN) is a set of pairs (φ, w) where φ is a formula in first-order logic w is a real number weight  Higher weight  stronger constraint

15 ※ e1 and e2 are events An Example of Markov Logic Networks  hasPastTense(a) : indicates that an event a has past tense  beforeDCT(a) : indicates that an event a happens before the DCT  before(a,b) : indicates that an event a happens before another event b IDWeight functionWeigh valueGround formula (A1) w a (e1) 3.1 hasPastTense(e1) ⇒ beforeDCT(e1) (A2) w a (e2) -0.9 hasPastTense(e2) ⇒ beforeDCT(e2) (B1)w b (e1,e2)1.7 beforeDCT(e1) ^ ¬ beforeDCT(e2) ⇒ before(e1, e2) hasPastTense(e1) beforeDCT(e1) w a (e1) beforeDCT(e2) w b (e1,e2) before (e1,e2)hasPastTense(e2) w a (e2) grounding

16 Global Feature Representation (Predicate Definition)  relE2T(e, t, r) : the relation r between an event e and a time expression t  relDCT(e, r) : the relation r between an event a and the DCT  relE2E(e1, e2, r) : the relation r between two events e1 and e2  relT2T(t1, t2, r) : the relation r between two time expressions t1 and t2  dctOrder(t, r) : the relation r between a time expression t and the DCT EVENT (e1) DCT TIME (t1) EVENT (e2) TIME (t2) relE2E (C)‏ relDCT (B)‏ relE2T (A)‏ dctOrder relT2T relDCT (B)‏ relE2T (A)‏

17  We jointly solve the three tasks of TempEval  We use global features named Joint formulae  A joint formula is based on a transition rule EVENT (e1)‏ DCT EVENT(e2)‏ BEFOREAFTER BEFORE BEFORE & AFTER ⇒ BEFORE EVENT (e2)‏ DCT EVENT(e1)‏ AFTER BEFORE BEFORE & AFTER ⇒ BEFORE If e1 happens before DCT and e2 happens after DCT => then e1 is before e2 If e1 happens before DCT and e1 happens after e2, => then e2 happens before DCT Global Feature Representation (Transition Rules) B→CC→B

18 Global Feature Representation (Templates of the all Joint Formulae) TasksJoint Formula (first-order logic) A→B dctOrder(t1,r) & relE2T(e1, t1, r1) ⇒ relDCT(e1,r2) B→A dctOrder(t1,r) & relDCT(e1,r1) ⇒ relE2T(e1, t1, r2) B →C relDCT(e1, r1) & relDCT(e2, r2) ⇒ relE2E(e1, e2, r3) C→B relDCT(e2, r1) & relE2E(e1,e2, r2) ⇒ relDCT(e1, r3) A→C relE2T(e1,t1,r1) & relT2T(t1,t2,r2) & relE2T(e2,t2,r3) ⇒ relE2E(e1,e2,r4) C→A relE2T(e2,t2,r2) & relT2T(t1,t2,r1) & relE2E(e1,e2,r3) ⇒ relE2T(e1,t1,r4) They are developed with events, time expressions and relations

19 Global Feature Representation (Templates of the all Joint Formulae) TasksJoint Formula (first-order logic) A→B dctOrder(t1,r) & relE2T(e1, t1, r1) ⇒ relDCT(e1,r2) B→A dctOrder(t1,r) & relDCT(e1,r1) ⇒ relE2T(e1, t1, r2) B →C relDCT(e1, BEFORE) & relDCT(e2, AFTER) ⇒ relE2E(e1, e2, BEFORE) C→B relDCT(e1, BEFORE) & relE2E(e1,e2, AFTER) ⇒ relDCT(e2, BEFORE) A→C relE2T(e1,t1,r1) & relT2T(t1,t2,r2) & relE2T(e2,t2,r3) ⇒ relE2E(e1,e2,r4) C→A relE2T(e2,t2,r2) & relT2T(t1,t2,r1) & relE2E(e1,e2,r3) ⇒ relE2T(e1,t1,r4) They are developed with events, time expressions and relations

20 Outline  Background and Motivation  Related work and task reviews of temporal relation identification  Proposed global approach with Markov Logic  Experimental setup and highlighted data  Summary and future work

21 Experimental Setup  Use a MLN Engine “Markov thebeast” Weight learning : MIRA Inference : Cutting Plane Inference (base solver: ILP) [Riedel, 2008]  Employ the local features referred to the early work in TempEval [SemEval, 2007]  Select joint formulae as global features  Use the same data and evaluation schemes of TempEval

22 Comparison of Local and Global LocalGlobal Task A (+0.049) Task B (+0.010) Task C (+0.019) All (+0.022) Results with 10-fold cross validation on training data  Over all tasks, Global is better than Local  On Task A, Global model outperformed Local one. ρ< 0.01 (McNemar’s test, 2-tailed) ※ All scores denote F1-value

23 Results with the other systems on test data (F1-value) Comparison to State-of-the-art  Outperformed the others on Tasks A and C  Always performed better than the best pure machine-learning based system (CU-TMP [Bethard and Martin, 2007]) Other Systems TempEval Best TempEval AverageCU-TMP Task A Task B Task C Our Systems LocalGlobal ※ All scores denote F1-value

24 Outline  Background and motivation  Related work and task reviews of temporal relation identification  Proposed global approach with Markov Logic  Experimental setup and highlighted data  Summary and future work

25 Summary  We proposed a global framework with Markov Logic for Temporal Relation Identification  Our global model with joint formulae successfully improved the performances of the identifications  Our approach reported the competitive results among all participants in TempEval

26 Future Work  Issues inherent to the task and the dataset Low inter annotator agreement Low transitive connectivity Small size  Semi-supervised approaches ease some issues TRAINDEVTESTTOTAL Task A Task B Task C Numbers of labeled relations for all tasks and datasets

27

28 Previous Global Framework of Temporal Relation Identification  Used Integer Linear Programming (ILP) [Chambers and Jurafsky, 2008] Minimize contradictions of local classifiers’ outputs by building ILP constraint problems Target only relations between events Identify only BEFORE, AFTER, UNKNOWN Manually construct ILPs Manually constructing ILP is often painful work, especially when we need too many constraints

29 29 / 22 Used Data (TempEval)  TimeML format (base on TimeBank) events, time expressions, temporal relations  Inter annotator agreement scores 72% on Tasks A and B, 68% on Task C TRAINDEVTESTTOTAL Task A Task B Task C Numbers of labeled relations for all tasks and datasets

30 The Distribution on the labels in TempEval TypeTask ATask BTask C BEFORE OVERLAP AFTER BEFOR-OR-OVERLAP OVERLAP-OR-AFTER35 54 VAGUE

31 31 / 22 Evaluation Schemes  Strict scoring scheme Give a full credit if the relations match, and no credit otherwise  Relaxed Scoring Scheme Give credit based on the score table BeforeOverlapAfterB-OO-AVague Before Overlap After Before-Or-Overlap Overlap-Or-After Vague

32 Comparison of Local and Global LocalGlobal strict (F1)relaxed (F1)strict (F1)Relaxed (F1) Task A (+0.049)0.691 (+0.046) Task B (+0.010)0.819 (+0.009) Task C (+0.019)0.623 (+0.015) All (+0.022)0.727 (+0.020) Results with 10-fold cross validation on training data  Over all tasks, Global is better than Local  On Task A, Global model outperformed Local one. ρ< 0.01 (McNemar’s test, 2-tailed)

33 Results with the other systems on test data Comparison to State-of-the-art  Global Model outperformed the others, especially on Tasks A and C  Our system always performed better than the best pure machine-learning based system (CU-TMP) Task ATask BTask C strictrelaxedstrictrelaxedstrictrelaxed TempEval Best TempEval Average CU-TMP Local Model Global Model