Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster 27-31 May 2009 Sean Wallis Survey of English Usage University College London.

Similar presentations


Presentation on theme: "Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster 27-31 May 2009 Sean Wallis Survey of English Usage University College London."— Presentation transcript:

1 Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster 27-31 May 2009 Sean Wallis Survey of English Usage University College London s.wallis@ucl.ac.uk

2 Outline The probability of Noriega What can a parsed corpus tell us? Individual choices Repeating choices Potential sources of interaction Case interaction LITEs What use is interaction evidence?

3 The probability of Noriega (Church 2000) Ken Church looked at word frequency in corpus data –Method Find probability of word occurring overall, pr(w) Divide each text into two halves: T1, T2 QWhat is the probability of the word in T2 if it has already been found in T1, pr(w in T2 | w in T1) ? –Result ‘Content words’ like Noriega leap in probability if seen before pr(w in T2 | w in T1) >> pr(w in T2) Pronouns, determiners, etc. no change T1T2

4 What can a parsed corpus tell us? Parsed corpora contain (lots of) trees –Use Fuzzy Tree Fragment queries to get data –An FTF –A matching case in a tree –Using ICECUP

5 What can a parsed corpus tell us? Three kinds of evidence may be obtained from a parsed corpus  Frequency evidence of a particular known rule, structure or linguistic event  Coverage evidence of new rules, etc.  Interaction evidence of the relationship between rules, structures and events Evidence is necessarily framed within a particular grammatical scheme –So… (an obvious question) how might we evaluate this grammar?

6 Individual choices (Nelson, Wallis & Aarts 2002) What factors affect a lexical / grammatical choice? –experiment: does IV  DV? Independent Variable (IV) = sociolinguistic or grammatical Dependent Variable (DV) = grammatical alternation –carry out a  2 test –e.g. does the type of preceding NP head affect the choice between relative and non-finite postmodification? peoplewho livein Hawaii vs. those living in Hawaii –a significant but small interaction –for more complex experiments repeat with multiple variables (ICECUP IV) N non- fin. rel. Total 6,7906,19312,983 7714461,217 7,5616,63914,200 PRON Total DV IV }{ 

7 Repeating choices (Wallis, submitted) Construction often involves repetition –e.g. repeated decisions to add an attributive AJP to specify a NP head: the tall white ship

8 Repeating choices (Wallis, submitted) Construction often involves repetition –e.g. repeated decisions to add an attributive AJP to specify a NP head: the tall white ship the tall ship the tall white ship the ship + +

9 Repeating choices (Wallis, submitted) Construction often involves repetition –e.g. repeated decisions to add an attributive AJP to specify a NP head: the tall white ship Sequential probability analysis –calculate probability of adding each AJP the tall ship the tall white ship the ship + +

10 Repeating choices (Wallis, submitted) Construction often involves repetition –e.g. repeated decisions to add an attributive AJP to specify a NP head: the tall white ship Sequential probability analysis –calculate probability of adding each AJP –probability falls second < first third < second fourth < second –choices interact –a feedback loop probability

11 Repeating choices - more examples  Adjectives before a noun similar to AJPs before a noun NP head  AVPs before a verb no interaction  NP postmodification, embedded vs. multiple both interact the probability of postmodification of the same head falls faster than that for embedding multiple embedded probability

12 Potential sources of interaction shared context –topic or ‘content words’ ( Noriega ) idiomatic conventions –semantic ordering of attributive adjectives ( tall white ship ) logical semantic constraints –exclusion of incompatible adjectives ( ?tall short ship ) communicative constraints –brevity on repetition (just say ship next time) psycholinguistic processing constraints –attention and memory of speakers

13 Case interaction (new research) Individual choice experiments –measure interaction between variables –statistics assume that cases are independent we know AJPs in an NP interact – what if we study AJPs? Cases from same text may also interact variables cases

14 Case interaction (new research) Cases should be independent –what can we do?  ignore problem  discount ‘obvious’ duplicate cases  randomly subsample  take only one case per text  score each case by the degree to which it interacts with others from the same text We need a model of case interaction

15 Case interaction (new research) An a posteriori model of case interaction  classify grammatical relationships between A and B

16 Case interaction (new research) An a posteriori model of case interaction  classify grammatical relationships between A and B  measure interaction strength dp(A, B) between A and B in each relationship

17 Case interaction (new research) An a posteriori model of case interaction  classify grammatical relationships between A and B  measure interaction strength dp(A, B) between A and B in each relationship  compute marginal probability for each case A from dependent probabilities dp(A, B), dp(A, C)...

18 Classify grammatical relationships Order –word order, dominance (parent-child vs. child-parent), etc. Topology –basic relationship: word, sibling, dominance etc. Grammar –subclassify topology by grammar –e.g. distinguishing co-ordination from other clauses Distance –steps along an axis and how steps are measured –e.g. whether to include all intermediate elements

19 Measure interaction strength Previous experiments involved single events –Bayesian probability differences (‘swing’) Noreiega ‘content words’: pr(a | b) – pr(a) Repeating choices: pr(a 2 | a 1 ) – pr(a 1 | a 0 ) Interaction between two groups of (alternate) events –Difference in probabilities of choice

20 Measure interaction strength Previous experiments involved single events –Bayesian probability differences (‘swing’) Noreiega ‘content words’: pr(a | b) – pr(a) Repeating choices: pr(a 2 | a 1 ) – pr(a 1 | a 0 ) Interaction between two groups of (alternate) events –Difference in probabilities of choice –Bayesian dependence dp B sum relative probability difference –Cramér’s  c based on chi-square (  2 ) not affected by direction

21 Compute marginal probability Find the probability that A is dependent on other cases –Suppose two other cases B and C exist with dependent probabilities dp(A, B), dp(A, C) and B and C also interact with  c (B, C)

22 Compute marginal probability Find the probability that A is dependent on other cases –Suppose two other cases B and C exist with dependent probabilities dp(A, B), dp(A, C) and B and C also interact with  c (B, C) –if  c (B, C) = 1 then dp(A) = maximum dp –if  c (B, C) = 0 then dp(A) = area –interpolate for other values of  c dependent independent

23 Compute marginal probability Find the probability that A is dependent on other cases –Suppose two other cases B and C exist with dependent probabilities dp(A, B), dp(A, C) and B and C also interact with  c (B, C) –if  c (B, C) = 1 then dp(A) = maximum dp –if  c (B, C) = 0 then dp(A) = area –interpolate for other values of  c Then compute marginal probability – ip(A) = 1 – dp(A) + {dp(A) / 2+  c (B, C)} Extend to more than three cases! dependent independent

24 LITEs (new research) Case interaction models –classify grammatical relationships –measure interaction strength between two choices A legitimate experimental method?

25 LITEs (new research) Case interaction models –classify grammatical relationships –measure interaction strength between two choices A legitimate experimental method? –cf. transmission experiments in physics emitterreceivermedium

26 LITEs (new research) Case interaction models –classify grammatical relationships –measure interaction strength between two choices A legitimate experimental method? –cf. transmission experiments in physics Linguistic interaction transmission experiments? emitterreceivermedium emitter receiver medium

27 LITEs (new research) A LITE investigates the interaction between two choices in a defined relationship – emitter/receiver non-finite vs. relative clauses – medium – up+down distance d via a clause C co-ordinated clauses; other clauses {non-finite, relative}

28 LITEs (new research) A LITE investigates the interaction between two choices in a defined relationship – emitter/receiver non-finite vs. relative clauses – medium – up+down distance d via a clause C co-ordinated clauses; other clauses –Plot  c over d skip intermediate co-ordination nodes –Result co-ordination exhibits >1.5x interaction for this choice

29 What use is interaction evidence? New methods for evaluating interaction along grammatical axes –General purpose, robust, structural –Based on grammar in corpus –Classifying grammatical relationships allows us to experiment with the corpus grammar Methods have philosophical implications –Grammar  structure framing linguistic choices –Linguistics as an evaluable observational science Signature (trace) of language production decisions –A unification of theoretical and corpus linguistics?

30 What use is interaction evidence? Corpus linguistics –Optimising existing grammar e.g. co-ordination, compound nouns Theoretical linguistics –Comparing different grammars, same language –Comparing different languages or periods Psycholinguistics –Search for evidence of language production constraints in spontaneous speech corpora speech and language therapy language acquisition and development

31 More information Useful links –Survey of English Usage www.ucl.ac.uk/english-usage –Fuzzy Tree Fragments www.ucl.ac.uk/english-usage/resources/ftfs –Individual choice experiments with FTFs www.ucl.ac.uk/english-usage/resources/ftfs/experiment.htm –To obtain ICE-GB (or DCPSE) www.ucl.ac.uk/english-usage/resources/sales.htm References Church 2000. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p 2. Proceedings of Coling-2000. 180-186. Nelson, G., Wallis, S.A. & Aarts, B. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam: John Benjamins. Wallis, S.A. {submitted}. Capturing linguistic interaction in a grammar: a method for empirically evaluating the grammar of a parsed corpus. Language. Available from www.ucl.ac.uk/english-usage/staff/sean/resources/analysing-grammatical-interaction.pdf


Download ppt "Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster 27-31 May 2009 Sean Wallis Survey of English Usage University College London."

Similar presentations


Ads by Google