Download presentation
Presentation is loading. Please wait.
Published byMitchell Gordon Modified over 8 years ago
1
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London s.wallis@ucl.ac.uk
2
Capturing linguistic interaction... Repetition and priming Parsed corpus linguistics Experiments –Attributive AJPs –Preverbal AVPs –Embedded postmodifying clauses Conclusions –Comparing grammars or corpora –Potential applications
3
Repetition and priming Lexical repetition –lexical strings and ‘poetics’ (Tannen 1987) –content words (Church 2000) Structural priming Different-structure priming Intra-structural priming
4
Repetition and priming Lexical repetition Structural priming –repetition of structure A-B priming: tendency for B to reuse structure used by A self-priming: tendency for A to reuse own structure –priming effect persists over some distance multiple sentences in a text, can last several minutes –studied by lab experiments (Pickering & Ferreira 2008) corpus studies (Szmrecsanyi 2006, Gries 2011) Different-structure priming Intra-structural priming
5
Repetition and priming Lexical repetition Structural priming Different-structure priming –one structure priming or ‘licensing’ another interaction between particular decisions within a construction aka “grammatical interaction” (Nelson et al 2002) Intra-structural priming
6
Repetition and priming Lexical repetition Structural priming Different-structure priming Intra-structural priming –structural priming within a construction interaction between elements within a larger structure typically self-priming –could be A-B priming (B completes A’s structure) –may be negative (e.g. horror aequi ) –requires a sizeable parsed corpus for study
7
Repetition and priming “Knock-on” effects –One structure priming a different structure grammatical interaction (Nelson et al 2002), linguistic choice experiments (Wallis 2003) Intra-structural priming –Structural priming within the construction interaction between elements within a larger structure typically self-priming –but could be A-B priming (where B completes A’s structure) “Priming” may be negative –e.g. horror aequi Require substantial grammatically annotated data –a sizeable parsed corpus
8
Parsed corpus linguistics Several million-word parsed corpora exist Each sentence analysed in the form of a tree –different languages have been analysed –limited amount of spontaneous speech data Commitment to a particular grammar required –different schemes have been applied –problems: computational completeness + manual consistency Tools support linguistic research in corpora
9
Parsed corpus linguistics An example tree from ICE-GB (spoken) S1A-006 #23
10
Parsed corpus linguistics Three kinds of evidence may be obtained from a parsed corpus Frequency evidence of a particular known rule, structure or linguistic event Coverage evidence of new rules, etc. Interaction evidence of the relationship between rules, structures and events This evidence is necessarily framed within a particular grammatical scheme –How might we evaluate this grammar?
11
Intra-structural priming Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head –a single additive step applied to this structure e.g. add an attributive AJP before the head Q. What is the effect of repeatedly applying the same operation to the structure? –the ship the tall ship the tall green ship…
12
Experiment 1: attributive AJPs Adjective phrases before a noun in English Simple idea: plot the frequency of NPs with at least n = 0, 1, 2, 3… attributive AJPs
13
Experiment 1: attributive AJPs Adjectives before a noun in English Simple idea: plot the frequency of NPs with at least n = 0, 1, 2, 3… attributive AJPs Raw frequencyLog frequency NB: not a straight line
14
Experiment 1: analysis of results If the log-frequency line is straight –exponential fall in frequency (constant probability) –no interaction between decisions (cf. coin tossing)
15
Experiment 1: analysis of results If the log-frequency line is straight –exponential fall in frequency (constant probability) –no interaction between decisions (cf. coin tossing) Sequential probability analysis –calculate probability of adding each AJP –error bars (Wilson) –probability falls second < first third < second fourth < second –decisions interact 0.00 0.05 0.10 0.15 0.20 012345 probability
16
Experiment 1: explanations? Feedback loop: for each successive AJP, it is more difficult to add a further AJP –Explanation 1: logical-semantic constraints tend to say tall green ship do not tend to say tall short ship or green tall ship –Explanation 2: communicative economy once speaker said tall green ship, tends to only say ship –Explanation 3: memory/processing constraints unlikely: this is a small structure General principle: –statistically significant change (fall?) in probability is evidence of an interaction along grammatical axis
17
Experiments 2: restrict Head Common vs. proper noun Heads –Common nouns: similar results –Proper nouns appear to behave differently BUT: Classification of adjectives in titular compounds –Northern England vs. lower Loire ) –Result may be artefact of the annotation Not every interaction of this type is necessarily a fall (negative)
18
Experiment 3: speech vs. writing Spoken vs. written subcorpora –Same overall pattern - method is robust –Spoken data tends to have fewer attributive AJPs Support for communicative economy hypothesis? –Significance tests Paired Wilson tests (Wallis 2011) Allows us to conclude that first and second observed spoken probabilities are significantly smaller probability written spoken
19
Experiment 4: preverbal AVPs Consider adverb phrases before a verb –Results very different Probability does not fall significantly between first and second AVP Probability does fall between third and second AVP –Possible constraints (weak) communicative not (strong) semantic –Further investigation needed 0.00 0.05 0.10 01234 probability
20
Experiment 5: embedded clauses Another way to specify nouns in English –add clause after noun to explicate it the ship [that was tall and green] the ship [in the port] –may be embedded the ship [in the port [with the ancient lighthouse]] –or successively postmodified the ship [in the port][with a very old mast] Compare successive embedding and sequential postmodifying clauses –Axis = embedding depth / sequence length
21
Experiment 5: method Extract examples with FTFs –at least n levels of embedded postmodification:
22
Experiment 5: method Extract examples with FTFs –at least n levels of embedded postmodification: 0 1 2 (etc.)
23
Experiment 5: method Extract examples with FTFs –at least n levels of embedded postmodification: 0 1 2 –problems: multiple matching cases (use ICECUP IV to classify) overlapping cases (subtract extra case) co-ordination of clauses or NPs (use alternative patterns) (etc.)
24
Experiment 5: analysis of results Probability of adding a further embedded postmodifying clause falls –All data second < first third < first –Spoken second < first –Written third < second Compare with effect of sequential postmodification of same head
25
Experiment 5: analysis of results Probability of sequential postmodifying falls - and - for spoken data, falls, then rises –All data second < first –Spoken third > second –Option: count conjoins separately or treat as single item Either way, results show same pattern –Negative feedback: the ‘in for a penny’ effect 0.00 0.05 0.10 0.15 012345 probability written spoken
26
Experiment 5: analysis of results Embedding vs. serial postmodification embedding > sequence (second level) –It is slightly easier to modify the latest head than a more remote one: semantic constraints? backtracking cost? –Third level embedding < sequence (if counting conjoins) long sequences seem to be easier to construct than comparable layers of embedding
27
Experiment 5: explanations? Lexical adjacency? –No: 87% of 2-level cases have at least one VP, NP or clause between upper and lower heads Misclassified embedding? –No: very few (5%) semantically ambiguous cases Language production constraints? –Possibly, could also be communicative economy differences between speech and writing –Spoken tends towards multiple sequential postmodification more than written contrast spontaneous speech with other modes
28
Conclusions A new method for evaluating interactions along grammatical axes –General purpose, robust, structural –More abstract than ‘linguistic choice’ experiments –Depends on a concept of grammatical distance along an axis, based on the chosen grammar Method has philosophical implications –Grammar viewed as structure of linguistic choices –Linguistics as an evaluable observational science Signature (trace) of language production decisions –A unification of theoretical and corpus linguistics?
29
Comparing grammars or corpora Can we reliably retrieve known interaction patterns with different grammars? –Do these patterns differ across corpora? Benefits over individual event retrieval non-circular: generalisation across local syntax not subject to redundancy: arbitrary terms makes trends more difficult to retrieve not atomic: based on patterns of interaction general: patterns may have multiple explanations Supplements retrieval of events
30
Potential applications Corpus linguistics –Optimising existing grammar e.g. co-ordination, compound nouns Theoretical linguistics –Comparing different grammars, same language –Comparing different languages or periods Psycholinguistics –Search for evidence of language production constraints in spontaneous speech corpora speech and language therapy language acquisition and development
31
Links and further reading Survey of English Usage –www.ucl.ac.uk/english-usage Corpora and grammar –.../projects/ice-gb Full paper –.../staff/sean/resources/analysing-grammatical- interaction.pdf Sequential analysis spreadsheet (Excel) –.../staff/sean/resources/interaction-trends.xls
32
Empirical evaluation of grammar Many theories, frameworks and grammars –no agreed evaluation method exists –linguistics is divided into competing camps –status of parsed corpora ‘suspect’ Possible method: retrievability of events circularity: you get out what you put in redundancy: ‘improvement’ by mere addition atomic: based on single events, not pattern specificity: based on particular phenomena New method: retrievability of event sequences
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.