Download presentation
Presentation is loading. Please wait.
Published byAubrey Gaines Modified over 8 years ago
1
Anaphora resolution in connectionist networks Florian Niefind University of Saarbrücken, Institute for Computational Linguistics Helmut Weldle University of Freiburg, Centre for Cognitive Science Workshop: „Representation and Processing of Language“ University of Freiburg, 20.11.2009
2
Connectionism and Language Connectionist approaches to language processing applied to various levels and processes (Christiansen & Chater, 1999, 2001) Sequence processing: Simple Recurrent Networks (SRNs: Elman, 1990, 1991, 1993) Linguistic representation in SRNs Associationist, probabilistic, distribution sensitive Constraint satisfaction …Grammar? …Syntactic structures? –Categorization by collocation –Transitions in phase states
3
Sentence processing in SRNs Feed-forward network performing word prediction Context layer provides memory for syntactic context Probability derivation: context dependent word (transition) probabilities Internal Representations: syntactic word classes, context specific features boythesaw boysaw the
4
SRNs as models for language processing? SRNs are merely semantics free POS-taggers (Steedman, 1999, 2002) Limited systematicity, but (Frank, 2006; Brakel & Frank 2009; Frank, Haselager & vanRooij, 2009) Sensitive to irrelevant structural relations (Frank, Mathis & Badecker, 2005) No extrapolation and variable binding (Marcus, 1998; concerning eliminative view: Holyoak & Hummel, 2000) Only structural relations –Language grounding: Acquisition in a situated fashion (Harnad, 1990; Glenberg, 1997; Barsalou, 1999) –Connectionist approaches to grounded acquisition (e.g., Cangelosi, 2005; Plunkett, et al., 1992; Coventry, et al., 2004)
5
Anaphora Resolution Anaphora resolution factors (constraints vs. preferences) Gender/number agreement Semantic consistency Salience Semantic/syntactic parallelism Global structural constraints: c-command Structurally determined complementary binding domains for pronouns and reflexives (G&B theory) a)Reflexives need a c-commanding NP as antecedent b)Pronouns must not have a c-commanding NP as antecedent (inside the boundaries of one sentence) (a) Ken i who likes John j saw himself i/*j. (b) Ken i who likes John j saw him j/*i.
6
Anaphora Resolution Is online anaphora resolution globally structure driven? –Pro sensitivity for structural binding constraints (Asudeh & Keller, 2001; Badecker & Straub, 2002 …but with influences of gender marking by inaccessible antecedents) –Contra I: dominance of exclusively structural principles Logophors (Kaiser et al., 2009; Runner, Sussman & Tanenhaus, 2003, 2006) Referential commitment (MacWhinney, 2008) –Contra II: sensitivity for structural constraints, but not within a global but rather a local frame
7
Anaphora Resolution in SRNs Origins of our studies: Investigation of the performance capacity of SRNs (Frank, Mathis & Badecker, 2005) –How abstract are the grammatical generalizations derived by SRNs? –Anaphora resolution (subsequently: variable binding) Acquisition of binding constraints for pronouns reflexives Lexically complex (variable reference) Structurally complex (bridging irrelevant structures) Architecture: Stepwise cascading SRNs
8
Word prediction Reference assignment Anaphora resolution in SRNs
9
Anaphora Resolution in SRNs Results (Frank, Mathis & Badecker, 2005) –Word prediction good performance –Reference assignment good performance for simple sentences bad performance for complex sentences that impose long- distance constraints –Internal representations reveal the problem Assignment is based on irrelevant structural generalizations E.g., pronoun/reflexive position after SRCs vs. ORCs
10
New Approach SRNs are capable of integrating multiple cues (e.g., Christiansen, Allen & Seidenberg, 1998) SRNs are capable of processing anaphors (Weldle, Konieczny, Müller, Wolfer & Baumann, 2009) Despite restrictions concerning variable binding –Interesting behaviour and predictions of SRNs –Behaviour and predictions for anaphora resolution!? Error-correspondence of performance Locality-effects, false alarms, local syntactic coherences (Konieczny, Müller & Ruh, 2009) Improved replication of Frank, Mathis & Badecker (2005) : mature grammatical representations by means of complex stimuli, task-driven representations forced by integrated cascading SRNs
11
Architecture: cascading SRNs SPC:70 hidden/context units 27 input/output units localistic lexical encoding RAC:35 hidden/context units 9 output units (referents) Learning rate:0.2 – 0.02 (grad. decr.) Momentum:0.6 Init. weight range:0.5 Training10 epochs backpropagation through time Integrative training allows the SRN to keep sensitive for structural information required to solve the reference assignment task word prediction(t +1 ) reference assignment (t 0 ) R EFERENCE A SSIGNMENT C OMPONENT S ENTENCE P ROCESSING C OMPONENT Input word-by-word (t 0 )
12
Training corpus Artificial training corpus, generated with a PCFG –20.000 sentences, presented word-by-word
13
Test corpora SRC Während der Germanist, der den Biologen sieht, sich/ihn kratzte, … „While the philologist, who saw the biologist, scratched him/himself…“ ORC Während der Germanist, den der Biologe sieht, sich/ihn kratzt, … „While the philologist, who the biologist saw, scratched him/himself…“ Test sets –Common test set –Complex test set: anaphora resolution and N/V- agreement in complex syntactic embeddings
14
Results Examination of –Output performance for word prediction –Output performance for reference assignment –Internal representations at anaphoric expression Grammatical Prediction Error (Christiansen & Chater, 1999)
15
Word prediction While the philologist, who saw the biologist, scratched him/himself …
16
Word prediction While the philologist, who saw the biologist, scratched him/himself …
17
Reference: pronouns While the philologist, who saw the biologist, scratched him …
18
Reference: reflexives While the philologist, who saw the biologist, scratched himself …
19
Reference: reflexives While the philologist, who saw the biologist, scratched himself …
20
Local syntactic coherences Analysis of probability vectors at anaphor position Activations are influenced by the antecedent directly preceeding the anaphoric expression Locally coherent sub-sequence crossing the RC-boundary (cf. converging previous simulation findings: Konieczny, Ruh & Müller, 2009) a.Enables access to normally inaccessible antecedents b.Inhibits access to normally accessible antecedents Internal representations (multivariate statistics) Do not reflect dependence on preceding phrase structure Categorization highlights gender- and agreement-marking of MC subject Network develops trans-structural generalizations the biologist, scratched himself i/*j „While the philologist i, who saw the biologist j, scratched himself i/*j …“ the biologist, scratched him „While the philologist i, who saw the biologist j, scratched him j/*i …“
21
Conclusions SRNs with proper prerequisites are in principle capable of anaphora resolution – within limits of interpolation Previous results (Frank, Mathis & Badecker, 2005) are most likely simulation artefacts of the architecture, training procedure and limited grammar Interferences by local coherent subsequences should be seen in terms of error correspondence: prediction of local coherence effects in anaphora resolution Local syntactic coherence effects (Konieczny, 2005; Konieczny et al., 2007, 2009) Effects also affect reference assignment (Weldle et al., 2009; Wolfer, previous talk)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.