1 Collocationality and how to measure it Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex
Euralex Sept 2006 Kilgarriff: Collocationality 2 Collocation Central Acknowledged for 10 years+ Stats well-explored Good dictionaries give main collocates
Euralex Sept 2006 Kilgarriff: Collocationality 3 New question: Which words tend to appear in collocations? Which words are more ‘collocational’?
Euralex Sept 2006 Kilgarriff: Collocationality 4 Dictionary publishers: collocation boxes where should they go? Coursebook authors: Which words do we use to teach about collocations?
Euralex Sept 2006 Kilgarriff: Collocationality 5 Intuition Standard words: occur equally with lots of other words flat distribution Collocational words large number of occurrences with small number of other words skewed distribution
Euralex Sept 2006 Kilgarriff: Collocationality 6 Entropy Measures how flat a distribution is Simple maths Well-established (Shannon-Weaver information theory)
Euralex Sept 2006 Kilgarriff: Collocationality 7 Which distribution? First pass A noun, the verbs it is object of
Euralex Sept 2006 Kilgarriff: Collocationality 8 VerbFreqMLE-prob (freq/3730) LogProb x log Take Gain Offer See Enjoy Obtain …………… Clarify …………… Total Entropy for advantage (object relation).
Euralex Sept 2006 Kilgarriff: Collocationality 9 Calculate for all nouns with 50+ objects found in BNC Higher-frequency nouns take more different objects “upward trend” of entropy with frequency Make adjustment
Euralex Sept 2006 Kilgarriff: Collocationality 10
Euralex Sept 2006 Kilgarriff: Collocationality 11 place (17881), attention (8476), door (8426), care (4884), step (4277), advantage (3730), rise (3334), attempt (2825), impression (2596), notice (2462), chapter (2318), mistake (2205), breath (2140), hold (1949), birth (1016), living (953), indication (812), tribute (720), debut (714), button (661), eyebrow (649), anniversary (637), mention (615), glimpse (531), suicide (486), toll (472), refuge (470), spokesman (453), sigh (436), birthday (429), wicket (412), appendix (410), pardon (399), precaution (396), temptation (374), goodbye (372), fuss (366), resemblance (350), goodness (288), precedence (285), havoc (270), tennis (266), comeback (260), farewell (228), prominence (228), go-ahead (202), sip (198),
Euralex Sept 2006 Kilgarriff: Collocationality 12
Euralex Sept 2006 Kilgarriff: Collocationality 13 Single-collocate items take the mickey come a cropper enter (or join) the fray make headway bear the brunt (of) take the piss beg pardon give the go-ahead. Should be in dictionary already
Euralex Sept 2006 Kilgarriff: Collocationality 14 List of strong collocates is long enough and diverse attention (draw, pay, attract, give, focus, turn), care (take, provide, need), impression (give, make, get, create) notice (serve, take, give) breath (catch, draw, take, hold) hold (grab, get, take, catch, keep) Useful for dictionaries
Euralex Sept 2006 Kilgarriff: Collocationality 15 Most collocational verbs (in relation to their objects, BNC data) take (106749), pay (18925), play (17832), raise (15477), spend (15267), open (11362), close (6106), shake (5483), sign (5100), answer (4177), exercise (3265), speak (3013), solve (2555), score (2495), live (2201), waste (2091), thank (1926), pose (1897), fulfil (1885), wait (1768), shut (1675), last (1521), incur (1365), research (1072), devote (1025), age (1009), exert (966), bite (919), park (836), beg (739), slam (634), sip (574), narrow (540), levy (450), nod (433), part (425), adjourn (424), pave (420), clasp (411), ratify (391), reap (376), bridge (337), shrug (324), enlist (322), clench (313), bow (303), wage (299), clap (256), redress (248), dial (232), retrace (205), poll (202), cock (200), coin (194), comb (193), purse (191), grit (170), stake (169), allay (167), wring (157), wag (154), peacekeep (151), fell (147), incline (139), wreak (138), ruffle (136), wrinkle (134), preheat (134), adduce (133), broach (122), foot (121), hunch (109), blink (103), bide (103), disobey (99), whet (89), sclerose (86), jog (85), buck (85), moisten (81), jumble (81), recharge (81), wuther (81), overstep (74), scroll (74), crane (74), hazard (70), mince (66), pervert (65), elapse (60), hesitate (60), grope (59), elbow (57), re-run (57), transact (55), contort (55), redouble (55), immunise (53), pry (52)
Euralex Sept 2006 Kilgarriff: Collocationality 16 Summary: Collocationality is a feature of some words can be measured results pedagogically promising Which words to use to teach collocation Where to place collocation panels in dictionaries
Euralex Sept 2006 Kilgarriff: Collocationality 17 Summary: the measure Clean simple maths Treats each grammatical relation separately (at the moment) Demonstrates Corpus analysis to find non-obvious facts about words
Euralex Sept 2006 Kilgarriff: Collocationality 18
Euralex Sept 2006 Kilgarriff: Collocationality 19 The long journey from text towards meaning Raw text Pure meaning Rationalists Empiricists
Euralex Sept 2006 Kilgarriff: Collocationality 20 The long journey from text towards meaning Raw text Pure meaning Rationalists Empiricists lemmatizer POS-tagger parser thesaurus thematic relations/frame elements