Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

1 Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University

2 Implications Linguist is no longer his/her own corpus Corpus data dont necessarily agree with introspection, intuition Broad speaker community may reveal linguists idiolect

3 Implications New research methodology requires new analyses Statistical, "soft" rules rather than hard yes/no rules Messy theories (hence remaining resistance to CL!)

4 Applications Gain better theoretical understanding of linguistic phenomena For lexical semantics work: --New challenges for lexicographic representation --Natural Language Processing Applications, e.g., text understanding, language generation

5 Two examples Large-scale corpus analysis of German VP idioms Discovery of scales

6 Corpus-based study of idioms Linguists, lexicographers, psycholinguists assume that idioms are fixed kein Blatt vor den Mund nehmen No leaf in front of the mouth take speak freely and frankly Non-compositional, opaque

7 Corpus data show Morphosyntactic variation: Ein Blatt nehmen sie dabei vor keinen Mund A leaf take they in front of no mouth (topicalization, shift of negation) Lexical variation Ein Regierungssprecher ist ein Mann, A government spokesman is a man der sich 100 Blaetter vor den Mund nimmt who 100 leaves in front of his mouth takes No theory of idiom grammar/representation accounts for all phenomena

8 Discovering scales Scalar adjectives (Sheinman & Tokunaga 2009; Schulam & Fellbaum 2010) …terrible-lousy-bad-mediocre-good-great-outstanding… Gradable emotions (Fellbaum & Mathieu 2010) …alarm-frighten-scare-terrify… Where on the scale are these words placed? What is their relative position (their strength, intensity)?

9 Discovering scales Corpus searches with seed pair reveals lexical- semantic patterns for asymmetry, such as X even Y (Y is stronger than X) If not X, at least Y (X>Y) X but not Y (X is weaker than Y) Patterns can be applied to all members of a scale, establish relative order

10 Conclusion Corpus may reveal linguistic data that challenge current theories escape introspection

