How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton
What do you think? (zero-freq senses don’t count)
The WSD task select correct sense in context sense inventory given in a dictionary old problem corpus methods are best
Lower bound Gale Church Yarowsky 1992 Baseline system: always choose commonest Around 70% Only small sample available SEMCOR Bigger sample, still too small SENSEVAL Big problem
Overview Mathematical model Evaluation (against SEMCOR) Implications for WSD evaluation
Model: assumptions Meanings unrelated Word sense frequency distribution same as word frequency distribution
Model All k word senses in a bag Randomly select 2 for a 2-sense word k(k-1)/2 possible 2-sense words
Set the frequency For a 2-sense word with freq 101, possibilities include 100:1 split How many times? 50:51 split How many times?
Words to model word senses Brown, or BNC How many types for each frequency Smooth to give monotonic-decreasing
Brown rawBrown smooth BNC rawBNC smooth … … … Freq # of words having that freq
Using Brown frequencies 100:1 split How many times? 16278*11.03 = 179,546 50:51 split How many times? 43.13*41.86 = 1805 Ratio 179,546:1805 = :1 split is 99 times likelier than 51:50
Generalising For a 2-sense word with fr=n select ‘commonest’ fr = m n/2 < m < n select another from subset where fr =n-m find all possible selections Calculate average ratio, commonest:other answer title question
Model: answers (BNC) n2-sense ‘words’ 3-sense ‘words’ 4-sense ‘words’
SEMCOR 250,000 word corpus Manually sense-tagged WordNet senses
Evaluate model against SEMCOR n2-sense words # % BNC 3-sense words # % BNC class class class
Discussion Same trend Assumption untrue: SFIP principle: a reading must be sufficiently frequent, insufficiently predictable to get into a dictionary generous vs pike generous: donation/person/helping pike: fish or weapon or hill or turnpike
Discussion More data, more meanings (without end) not changing ratios for known senses but addition of new senses Models pike not generous Dominated by singletons
SENSEVAL Evaluation exercise for WSD 1998; 2001; 2004 Two task-types: Lexical sample Choose a small samples of words and disambiguate multiple instances of each All-words Choose a text or two, disambiguate all words
Lower bound and SENSEVAL All-words Samples too small to see extent of skew freq of 2-sense word =3: lower bound=67% Lexical sample Skew in manual sample selection “good” candidate words show “balance” (amazing) Are systems better than baseline? SENSEVAL-3: systems scarcely beat baseline Not proven (and not likely)
What is the commonest sense Varies with domain More mileage than disambiguation cf default strategy in commercial MT McCarthy Koeling Weeds Carroll ACL-04 3-sentence window does not allow domain-identification methods Domain-id task more interesting and worthwhile than WSD
Thank you