Download presentation
Presentation is loading. Please wait.
Published byRuth White Modified over 9 years ago
1
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman
2
What computed and what stored? A basic question for theories of language representation, processing, and acquisition. At the sub-word level (O’Donnell, 2015): – “ness” in pine-scentedness vs. “th” in warmth. Many empirical & theoretical work. bucket kick the Storage kick the bucket Few work applies to cognitive datasets.
3
Human Sentence Processing Probabilistic syntax models: – Reading times: (Roark et al., 2009). – Eye fixation times: (Demberg & Keller, 2008). Incremental parsing algorithms Probabilistic syntax models + Human reading difficulty No work has examined the influence of storage and computation in syntax.
4
This work Propose a framework to evaluate C&S models. Study the influence of storage units in predicting reading difficulty. Maximal computation Maximal storage Mixed-effects Analysis Incremental Parser C&S models Reading difficulty surprisals
5
Models of computation & storage 3 models of computation & storage (C&S) Gold parse trees are assumed to be known – Can do MAP estimation. Maximal computation Maximal storage Dirichlet multinomial PCFGs Fragment Grammars MAP Adaptor Grammars
6
C&S Models – Maximal Computation Dirichlet-Multinomial PCFG (Johnson, et al. 2007) – Storage: minimal abstract units – PCFG rules – Computation: maximal. Put less probability mass on frequent structures
7
C&S Models – Maximal storage MAP Adaptor Grammar (Johnson, et al. 2007) – Storage: DMPCFGs + maximally specific units. – Computation: minimal. Put probability mass on two many infrequent structures
8
C&S Models – Inference-based Fragment grammars (O’Donnell, et al. 2009) – Storage: inference over rules best explains data. Rules in MAG + rules rewrite to non-terminals / terminals – Computation: optimal. Make the right trade-off between storage and computation.
9
Human reading time prediction Mixed-effects Analysis Incremental Parser C&S models Reading difficulty surprisals Improve our parser to handle different grammars.
10
Surprisal Theory Lexical predictability of words given contexts – (Hale, 2001) and (Levy, 2008) – Surprisal value: Strong correlation with: – Eye-tracking time: (Demberg and Keller, ’08). – Self-paced reading time: (Roark et al., ’09).
11
Incremental Parser Top-down approach for CFG (Earley, 1970). Earley algorithm for PCFG (Stolcke, 1995): – Prefix probabilities – Needed to to compute surprisal values: Our parser: based on Levy (08)’s parser. – Additional features to handle different grammars. – Publicly available.
12
Incremental parser – Features Handle arbitrary PCFG rewrite rules: – MAP Adaptor Grammars: VP -> kick the bucket – Fragment Grammars: VP -> kick NP Handle large grammars: Grammars# rules DM-PCFG75K FG146K MAG778K
13
Human reading time prediction Mixed-effects Analysis Incremental Parser C&S models Reading data surprisals Show consistent results in two different corpora.
14
Experiments Grammars: DMPCFG, MAG, FG – trained on WSJ (length < 40 words). Corpora: – Eye-tracking: Dundee corpus (Kennedy & Pynte, 05). – Self-paced reading: MIT corpus (Bachrach et al., ’09). SentWordSubjOrigFiltered Dundee2,37058K10586K229K MIT1993.5K2381K70K
15
Model Prediction Evaluation How well models predict words in the test data? – Average the surprisal values. Ranking: FG ≻ DMPCFG ≻ MAG DundeeMIT DMPCFG6.826.80 MAG6.916.95 FG6.35
16
Evaluation on Cognitive Data How well models explain reading times? – Mixed-effects analysis. – Surprisal values for DMPCFG, MAG, FG as predictors. Settings: similar to (Fossum and Levy, 2012). – Random effects: by-word and by-subject intercepts. – Eye fixation and reading times: log-transformed. Nested model comparisons with 2 tests.
17
Additive tests Effect of each grammar predictor. Ranking: FG ≻ DMPCFG ≻ MAG 2 DundeeMIT Base + DMPCFG70.9**38.5** Base + MAG10.9*0.1 Base + FG118.3**62.5** (**: 99% significant, *: 95% significant)
18
Subtractive tests Effect of each grammar predictor explains above and beyond others. Ranking: FG ≻ MAG ≻ DMPCFG – DMPCFG doesn’t explain above and beyond FG. 2 DundeeMIT Full - DMPCFG4.0*3.5* Full - MAG14.3**23.6** Full - FG62.5**42.9** (**: 99% significant, *: 95% significant)
19
Mixed-effect coefficients Full setting: with predictors from all models. MAG is negatively correlated with reading time. – Syntax is still mostly compositional. – Only a small fraction of structures are stored. DundeeMIT DMPCFG0.001950.00324 MAG-0.00141-0.00282 FG0.005490.00697
20
Conclusion Study the effect of computation & storage in predicting reading difficulty: Provide a framework for future research in human sentence processing. Thank you! Maximal computation Maximal storage Dirichlet multinomial PCFGs Fragment Grammars MAP Adaptor Grammars
21
Earley parsing algorithm Top-down approach developed by Earley (1970): – States – pending derivations: [l, r] X ↦ Y. Z – Operations – state transitions: predict, scan, complete Predict Scan Complete 0123 dogschasecats Grammar: S ↦ NP VP, VP ↦ V NP, NP ↦ dogs, NP ↦ cats, V ↦ chase Root ↦. S S ↦. NP VP NP ↦. dogs NP ↦ dogs. S ↦ NP. VP VP ↦. V NP V ↦. chase V ↦ chase. VP ↦ V. NP NP ↦. cats NP ↦ cats. VP ↦ V NP. S ↦ NP VP. Root ↦ S.
22
Earley algorithm for PCFGs (Stolcke, 95) Earley path: a sequence of states linked by Earley operations (predict, scan, complete). – Partial derivations Earley paths. – P(d) = product of rule probs used in predicted states. Prefix probability: sum of derivation probabilities across all paths yielding a prefix x. wiwi wiwi w0w0 w0w0 w1w1 w1w1 … Root Prefix probability P(w 0 w 1 … w i ) Earley paths d 1 d 2... d n thedogspiggie P(d 1 ) P(d n ) P(d 2 )
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.