Ryu Iida Tokyo Institute of Technology Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology

Ryu Iida Tokyo Institute of Technology ryu-i@cl.cs.titech.ac.jp Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology {inui,matsu}@is.naist.jp Capturing Salience with a Trainable Cache Model for Zero-anaphora Resolution 1

Introduction  Many researchers have focused on the research area of anaphora (coreference)  For NLP applications such as IE and MT  Anaphora resolution  Search for an antecedent in the search space 2 NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it. antecedent anaphor Search space

Problem  Large search space makes practical anaphora resolution difficult  Task: reducing the search space 3 The National Transportation Safety Board is borrowing a Boeing 737 from Seattle's Museum of Flight as part of its investigation into why a similar jetliner crashed near Pittsburgh in 1994. The museum's aircraft, ironically enough, was donated by USAir, which operated the airplane that crashed, killing 132 people on board. The board is testing the plane's rudder controls to learn why Flight 427 suddenly rolled and crashed while on its approach to the Pittsburgh airport Sept. 8, 1994. Aviation safety investigators say a sharp movement of the rudder ( the movable vertical piece in the plane's tail ) could have caused the jet's deadly roll. NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it. Search space

Previous work  Machine learning-based approaches (Aone and Bennett, 1995; McCarthy and Lehnert, 1995; Soon et al., 2001; Ng and Cardie, 2002; Seki et al., 2002; Isozaki and Hirao, 2003; Iida et al., 2005; Iida et al., 2007a, Yang et al. 2008)  Less attention to search space problem  Heuristically limit search space  e.g. system deals with candidates only occurring in N previous sentences (Yang et al. 2008)  Problem: Exclude an antecedent when it is located further than N sentences from its anaphor 4

Previous work (Cont’d)  Rule-based approaches (e.g. approaches based on Centering Theory (Grosz et al. 1995) )  Only deal with the salient discourse entities at each point of discourse status  Drawback: Centering Theory only retains information about the previous sentence  Exception: Suri&McCoy (1994), Hahn&Strube(1997)  Overcome this drawback  Still limited by the restrictions fundamental to the notion of Centering Theory 5

Our solution  Reduce search space for given anaphor by applying the notion of ‘‘caching’’ introduced by Walker (1996) 6 NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it. Search space

Our solution  Reduce search space for given anaphor by applying the notion of ‘‘caching’’ introduced by Walker (1996) 7 NTSB Chairman Jim Hall is to address a briefing on the investigation in Seattle Thursday, but board spokesman Mike Benson said Hall isn't expected to announce any findings. Benson said investigators are simulating air loads on the 737's rudder. ``It's a slow, methodical job since we don't have adequate black boxes,'' he said. Newer models of flight data recorders, or ``black boxes,'' would record the angle of the rudder and the pedal controlling it. Search space NTSB Chairman Jim Hall, investigators, the rudder extract most salient candidates cache search for antecedent

Implementation of cache models  Walker (1996)’s cache model  Two devices  Cache: holds most salient discourse entities  Main memory: retains all other entities  Not fully specified for implementation  Our approach  Specify how to retain salient candidates based on machine learning to capture both local and global foci of discourse  Dynamic cache model (DCM) 8

Dynamic cache model (DCM)  Dynamically update cache information in sentence-wise manner  Take into account local transition of salience e (i+1)1 e (i+1)2 … e (i+1)N c i1 c i2 … c iM e i1 e i2 … e iN Cache C i Sentence S i Cache C i+1 dynamic cache model 9 C (i+1)1 c (i+1)2 … c (i+1)M retained discarded

Dynamic cache model (DCM)  Difficult to create the training instances for the problem where the model retains the N most salient candidates e (i+1)1 e (i+1)2 … e (i+1)N c i1 c i2 … c iM e i1 e i2 … e iN Cache C i Sentence S i Cache C i+1 10 C (i+1)1 c (i+1)2 … c (i+1)M retained discarded dynamic cache model

DCM: ranking candidates  Recast candidate selection as ranking problem in machine learning  Training instances created from anaphoric relations annotated in corpus  For given candidate C at the current context, (i.e. either C is in current cache or C appears in current sentence) if C is referred to by anaphor appearing in following contexts  ‘‘retained’’ (1st place) otherwise  ‘‘discarded’’ (2nd place) 11

DCM: creating training instances C 1 C 2 C 3 C 4 A i C 5 C 6 C 7 A j A k C 8 S1S1 S2S2 S3S3 Training instances 12 Annotated corpus retained (1 st ): C 1 C 4 discarded (2 nd ): C 3 C 5 C 6 C2 is not referred to by any anaphors appearing in the following contexts  discarded C: candidate A: anaphor retained (1 st ): C 1 discarded (2 nd ): C 2 C1 is referred to by A i in S2  retained

Zero-anaphora resolution process Tom-wa kouen-o sanpos-iteimashita Tom was walking in the park (φ-ga) John-ni funsui-no mae-de a-tta (He) met John in front of the fountain (φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimashita (Tom) asked (John) the result of yesterday's game (φ-ga) amari yoku na-katta youda (The result) does not seem to be very good. cache (size=2) 13 φ: zero-pronoun

Zero-anaphora resolution process Tom-wa kouen-o sanpos-iteimashita Tom was walking in the park (φ-ga) John-ni funsui-no mae-de a-tta (He) met John in front of the fountain (φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimashita (Tom) asked (John) the result of yesterday's game (φ-ga) amari yoku na-katta youda (The result) does not seem to be very good. cache (size=2) Tom (Tom), kouen (park) Tom (Tom), John (John) 14 φ: zero-pronoun

Zero-anaphora resolution process Tom-wa kouen-o sanpos-iteimashita Tom was walking in the park (φ-ga) John-ni funsui-no mae-de a-tta (He) met John in front of the fountain (φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimashita (Tom) asked (John) the result of yesterday's game (φ-ga) amari yoku na-katta youda (The result) does not seem to be very good. cache (size=2) Tom (Tom), kouen (park) Tom (Tom), John (John) Tom (Tom), kekka (result) 15 φ: zero-pronoun

Zero-anaphora resolution process Tom-wa kouen-o sanpos-iteimasita Tom was walking in the park (φ-ga) John-ni funsui-no mae-de a-tta (He) met John in front of the fountain (φ-ga) (φ-ni) kinou-no shiai-no kekka-o ki-kimaista (Tom) asked (John) the result of yesterday's game (φ-ga) amari yoku na-katta youda (The result) does not seem to be very good. cache (size=2) Tom (Tom), kouen (park) Tom (Tom), John (John) Tom (Tom), kekka (result) 16 φ: zero-pronoun

Evaluating caching mechanism on Japanese zero-anaphora resolution  Investigate how cache model contributes to candidate reduction  Explore candidate reduction ratio of each cache model and its coverage  Coverage =  Create a ranker using Ranking SVM (Joachims 2002) 17 # of antecedents retained in cache models # of all antecedents

Data set  NAIST Text Corpus (Iida et al., 2007)  Data set for cross-validation: 287 articles  699 zero-pronouns  Conduct 5-fold cross-validation 18

Baseline cache models  Centering-based cache model  store the preceding ‘wa’ (topic)-marked or ‘ga’ (subject)-marked candidate antecedents  An approximation of the model proposed by Nariyama (2002)  Sentence-based cache model (Soon et al. 2001, Yang et al. 2008, etc.)  Store candidate antecedents in the N previous sentences of a zero-pronoun  Static cache model  Does not capture dynamics of text  Rank candidates at once according to rank based on global focus of text 19

Feature set for cache models  Default features  Part-of-speech, located in a quoted sentence or not, located in the beginning of a text, case marker (i.e. wa, ga), syntactically depends on the last bunsetsu unit (i.e. as basic unit in Japanese) in a sentence  Features only used in DCM  The set of connectives intervening between C i and the beginning of the current sentence S  The number of anaphoric chain  C i is currently stored in the cache or not  Distances between S and C i in terms of a sentence 20

Results: caching mechanism Search space CM: centering-based model, SM: sentence-based model 21

Evaluating antecedent identification  Antecedent identification task of inter- sentential zero-anaphora resolution  cache size: 5 to all candidates  Compare the three cache models  Centering-based cache model  Sentence-based cache model  Dynamic cache model  Investigate computational time 22

Antecedent identification and anaphoricity determination models Antecedent identification model  Tournament model (Iida et al., 2003)  Select the most likely candidate antecedent by conducting a series of matches in which candidates compete with each others Anaphoricity determination model  Selection-then-classification model (Iida et al., 2005)  Determine anaphoricity by judging an anaphor as anaphoric only if its most likely candidate is judged as its antecedent. 23

Results of antecedent identification ModelAccuracyRuntimecoverage CM0.44111m03s0.651 SM (s=1)0.3816m54s0.524 SM (s=2)0.44813m14s0.720 SM (s=3)0.46619m01s0.794 DCM (n=5)0.4464m39s0.664 DCM (n=10)0.4418m56s0.764 DCM (n=15)0.44212m53s0.858 DCM (n=20)0.44316m35s0.878 DCM (n=#candidates)0.45253m44s0.928 CM: centering-based model, SM: sentence-based model, DCM: dynamic cache model 24

Results of antecedent identification ModelAccuracyRuntimecoverage CM0.44111m03s0.651 SM (s=1)0.3816m54s0.524 SM (s=2)0.44813m14s0.720 SM (s=3)0.46619m01s0.794 DCM (n=5)0.4464m39s0.664 DCM (n=10)0.4418m56s0.764 DCM (n=15)0.44212m53s0.858 DCM (n=20)0.44316m35s0.878 DCM (n=#candidates)0.45253m44s0.928 CM: centering-based model, SM: sentence-based model, DCM: dynamic cache model 25

Conclusion  Proposed a machine learning-based cache model in order to reduce the computational cost of anaphora resolution  Recast discourse status updates as ranking problems of discourse entities by using anaphoric relations annotated in corpus as clues  Our learning-based cache model drastically reduces search space while preserving accuracy 26

Future work  The procedure for zero-anaphora resolution is carried out linearly  i.e. antecedent is independently selected without taking into account any other zero-pronouns  Trends in anaphora resolution have shifted to more sophisticated approaches which globally optimize the interpretation of all referring expressions in a text  Poon & Domingos (2008): Markov Logic Network  Incorporate our caching mechanism into such global approaches 27

 Thank you for your kind attention 28

Feature set used in antecedent identification models 30

Overall zero-anaphora resolution  Investigate the effects of introducing the cache model on overall zero-anaphora resolution including intra-sentential zero- anaphora resolution  Compare the zero-anaphora resolution model with different cache sizes  Iida et al (2006)’s model  Exploit syntactic patterns as features 31

Results of overall zero-anaphora resolution  All models achieved almost the same performance 32

Static cache model (SCM)  Grosz & Sidner (1995)’s global focus  Entity or set of entities salient throughout the entire discourse  Characteristics of SCM  Does not capture dynamics of the text  Select N most salient candidates according to the rank based on the global focus of the text 33

SCM: Training and test phase  Training phase  Test phase C 1 C 2 C 3 C 4 φ i C 5 C 6 C 7 φ j φ k C 8 φ l C 9 C 10 Ci: candidate antecedent φj: zero-pronoun S1S1 S2S2 S3S3 S4S4 1 st : C 1 C 4 C 7 2 nd : C 2 C 3 C 5 C 6 C 8 C 9 C 10 Training instances C’ 1 C’ 2 C’ 3 C’ 4 C’ 5 C’ 6 C’ 7 C’ 8 C’ 9 ranker 1 st : C’ 1 2 nd : C’ 6.. N th :C’ 3 N most salient candidates 34

Zero-anaphora resolution process For a given zero-pronoun φ in sentence S 1. Intra-sentential anaphora resolution  Search for an antecedent A in S  If A i is found, return A i ; otherwise go to step 2 2. Inter-sentential anaphora resolution  Search for an antecedent A j in the cache  If A j is found, return A j ; otherwise φ is judged as exophoric 3. Cache update  Take into account the candidates in S as well as the already retained candidates in the cache 35

Zero-anaphora  Zero-anaphor: a gap with an anaphoric function  Zero-anaphora resolution becoming important in many applications  In Japanese, even obligatory arguments of predicates are often omitted when they are inferable from the context  45% nominatives are omitted in newspaper articles 36

Zero-anaphora (Cont’d)  Two sub-tasks  Anaphoricity determination  Determine whether a zero-pronoun is anaphoric  Antecedent identification  Select an antecedent for a given zero-pronoun Mary i -wa John j -ni (φ j -ga) tabako-o yameru-youni it-ta. Mary i -TOP John j -DAT (φ j -NOM) smoking-OBJ quit-COMP say-PAST PUNC Mary i told John j to quit smoking. (φ i -ga) tabako-o kirai-dakarada. (φ i -NOM) smoking-OBJ hate-BECAUSE PUNC Because (she i ) hates people smoking. 37

Ryu Iida Tokyo Institute of Technology Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology

Similar presentations

Presentation on theme: "Ryu Iida Tokyo Institute of Technology Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ryu Iida Tokyo Institute of Technology Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology

Similar presentations

Presentation on theme: "Ryu Iida Tokyo Institute of Technology Kentaro Inui Yuji Matsumoto Nara Institute of Science and Technology"— Presentation transcript:

Similar presentations

About project

Feedback