Presentation is loading. Please wait.

Presentation is loading. Please wait.

5/16/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini.

Similar presentations


Presentation on theme: "5/16/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini."— Presentation transcript:

1 5/16/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini

2 5/16/2015CPSC503 Winter 20092 Today 23/10 Three well-defined Semantic Task Word Sense Disambiguation –Corpus and Thesaurus Word Similarity –Thesaurus and Corpus Semantic Role Labeling (corpus)

3 5/16/2015CPSC503 Winter 20093 WSD example: table + ?? -> [1-6] The noun "table" has 6 senses in WordNet. 1. table, tabular array -- (a set of data …) 2. table -- (a piece of furniture …) 3. table -- (a piece of furniture with tableware…) 4. mesa, table -- (flat tableland …) 5. table -- (a company of people …) 6. board, table -- (food or meals …)

4 5/16/2015CPSC503 Winter 20094 WSD methods Machine Learning –Supervised –Unsupervised Dictionary / Thesaurus (Lesk)

5 5/16/2015CPSC503 Winter 20095 Supervised ML Approaches to WSD Machine Learning Classifier Training Data ((word + context 1 )  sense 1 ) …… ((word + context n )  sense n ) sense (word + context)

6 5/16/2015CPSC503 Winter 20096 Training Data Example..after the soup she had bass with a big salad… ((word + context)  sense) i context Examples, One of 8 possible senses for “bass” in WordNet One of the 2 key distinct senses for “bass” in WordNet sense

7 5/16/2015CPSC503 Winter 20097 WordNet Bass: music vs. fish The noun ``bass'' has 8 senses in WordNet 1.bass - (the lowest part of the musical range) 2.bass, bass part - (the lowest part in polyphonic music) 3.bass, basso - (an adult male singer with …) 4.sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) 5.freshwater bass, bass - (any of various North American lean-fleshed ………) 6.bass, bass voice, basso - (the lowest adult male singing voice) 7.bass - (the member with the lowest range of a family of musical instruments) 8.bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)

8 5/16/2015CPSC503 Winter 20098 Representations for Context GOAL: Informative characterization of the window of text surrounding the target word Supervised ML requires a simple representation for the training data: vectors of feature/value pairs TASK: Select relevant linguistic information, encode them as a feature vector

9 5/16/2015CPSC503 Winter 20099 Relevant Linguistic Information(1) Collocational: info about the words that appear in specific positions to the right and left of the target word Example text (WSJ) –An electric guitar and bass player stand off to one side not really part of the scene, … Assume a window of +/- 2 from the target [ guitar, NN, and, CJC, player, NN, stand, VVB] [word in position -n, part-of-speech position -n, … word in position +n, part-of-speech position +n,] Typically words and their POS

10 5/16/2015CPSC503 Winter 200910 Relevant Linguistic Information(2) Co-occurrence: info about the words that occur anywhere in the window regardless of position Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band )) Vector for one case: [c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar), c(band)] Example text (WSJ) –An electric guitar and bass player stand off to one side not really part of the scene, … [0,0,0,1,0,0,0,0,0,0,1,0]

11 5/16/2015CPSC503 Winter 200911 Training Data Examples [guitar, NN, and, CJC, player, NN, stand, VVB, 0] [0,0,0,1,0,0,0,0,0,0,1,0,0] Let’s assume: bass-music encoded as 0 bass-fish encoded as 1 [a, AT0, sea, CJC, to, PRP, me, PNP, 1] [play, VVB, the, AT0, with, PRP, others, PNP, 0] [……… ] [1,0,0,0,0,0,0,0,0,0,0,0,1] [1,0,0,0,0,0,0,0,0,0,0,1,1] […………………..] Inputs to classifiers [guitar, NN, and, CJC, could, VM0, be, VVI] [1,1,0,0,0,1,0,0,0,0,0,0]

12 5/16/2015CPSC503 Winter 200912 ML for Classifiers Machine Learning Training Data: Co-occurrence Collocational Classifier Naïve Bayes Decision lists Decision trees Neural nets Support vector machines Nearest neighbor methods…

13 5/16/2015CPSC503 Winter 200913 Naïve Bayes Independence

14 5/16/2015CPSC503 Winter 200914 Naïve Bayes: Evaluation Experiment comparing different classifiers [Mooney 96] Naïve Bayes and Neural Network achieved highest performance 73% in assigning one of six senses to line Is this good? Simplest Baseline: “most frequent sense” Celing: human inter-annotator agreement –75%-80% on refined sense distinctions (wordnet) –Closer to 90% for binary distinctions

15 5/16/2015CPSC503 Winter 200915 Bootstrapping What if you don’t have enough data to train a system… More Data More Classified Data Machine Learning Small Training Data Classifier seeds

16 5/16/2015CPSC503 Winter 200916 Bootstrapping: how to pick the seeds E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense Hand-labeling (Hearst 1991) : –Likely correct –Likely to be prototypical One sense per collocation (Yarowsky 1995): One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense

17 5/16/2015CPSC503 Winter 200917 Unsupervised Methods [Schutze ’98] Machine Learning (Clustering) Training Data K Clusters c i (word + vector) 1 …… (word + vector) n Hand-labeling (c 1 sense 1 ) …… (word + vector) sense Vector/cluster Similarity

18 5/16/2015CPSC503 Winter 200918 Agglomerative Clustering Assign each instance to its own cluster Repeat –Merge the two clusters that are more similar Until (specified # of clusters is reached) If there are too many training instances ->random sampling

19 5/16/2015CPSC503 Winter 200919 Problems Given these general ML approaches, how many classifiers do I need to perform WSD robustly –One for each ambiguous word in the language How do you decide what set of tags/labels/senses to use for a given word? –Depends on the application

20 5/16/2015CPSC503 Winter 200920 WDS: Dictionary and Thesaurus Methods Most common: Lesk method Choose the sense whose dictionary gloss shares most words with the target word’s neighborhood Exclude stop-words Def: Words in gloss for a sense is called the signature

21 5/16/2015CPSC503 Winter 200921 Lesk: Example Two SENSES for channel S1 : (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rain water into a series of channels under the street" S2 : (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" ….. “ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain.”

22 5/16/2015CPSC503 Winter 200922 Corpus Lesk Best performer If a corpus with annotated senses is available For each sense: add to the signature for that sense, words “that frequently appear” in the sentences containing that sense CORPUS …… “most streets closed to the TV station were flooded because the main channel was clogged by heavy rain. …..

23 5/16/2015CPSC503 Winter 200923 WSD: More Recent Trends SemEval workshops – Cross Language Evaluation Forum (CLEF) Better ML techniques (e.g., Combining Classifiers) Combining ML and Lesk (Yuret,2004) Other Languages Building better/larger corpora

24 5/16/2015CPSC503 Winter 200924 Today 22/10 Word Sense Disambiguation Word Similarity Semantic Role Labeling

25 5/16/2015CPSC503 Winter 200925 Word Similarity/Semantic Distance Actually relation between two senses sun vs. moon – mouth vs. food – hot vs. cold Applications? Thesaurus methods: measure distance in online thesauri (e.g., Wordnet) Distributional methods: finding if the two words appear in similar contexts

26 5/16/201526 WS: Thesaurus Methods ( path-length) Path-length sim based on isa hierarchies If we do not have Word Sense Disambiguation CPSC503 Winter 2009

27 5/16/201527 WS: Thesaurus Methods(info content) Or not all edges are equal…. Add probabilistic info derived from a corpus probability Information Lowest Common Subsumer CPSC503 Winter 2009

28 5/16/201528 WS: Thesaurus Methods(info-content) One of best performers – Jiang-Conrath distance This is a measure of distance. Reciprocal for similarity! See also extended Lesk CPSC503 Winter 2009

29 5/16/201529 Best Performers Jiang-Conrath Extended Lesk Wordnet::Similarity Package Pedersen et al. 2004 CPSC503 Winter 2009

30 5/16/2015CPSC503 Winter 200930 WS: Distributional Methods Do not have any thesauri for target language If you have thesaurus, still –Missing domain-specific (e.g., technical words) –Poor hyponym knowledge (for V) and nothing for Adj and Adv –Difficult to compare senses from different hierarchies Solution: extract similarity from corpora Basic idea: two words are similar if they appear in similar contexts

31 5/16/2015CPSC503 Winter 200931 WS Distributional Methods (1) Context: feature vector Example: f i how many times w i appeared in the neighborhood of w Stop list

32 5/16/2015CPSC503 Winter 200932 WS Distributional Methods (2) More informative values (referred to as weights or measure of association in the literature) Point-wise Mutual Information t-test

33 5/16/2015CPSC503 Winter 200933 WS Distributional Methods (3) Similarity between vectors Not sensitive to extreme values v w  Normalized (weighted) number of overlapping features

34 5/16/2015CPSC503 Winter 200934 WS Distributional Methods (4) Best combination overall (Curan 2003) –t-test for weights –Jaccard (or Dice) for vector similarity

35 5/16/2015CPSC503 Winter 200935 Today 22/10 Word Sense Disambiguation Word Similarity Semantic Role Labeling

36 5/16/2015CPSC503 Winter 200936 Semantic Role Labeling Typically framed as a classification problem [Gildea, Jurfsky 2002] 1.Assign parse tree to input 2.Find all predicate-bearing words (PropBank, FrameNet) 3.For each “governing” predicate: determine for each synt. constituent which role (if any) it plays with respect to the predicate Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others

37 5/16/2015CPSC503 Winter 200937 Semantic Role Labeling: Example [issued, NP, Examiner, NNP, NP  S  VP  VBD, active, before, …..] ARG0 predicate, phrase type, head word and its POS, path, voice, linear position…… and many others

38 5/16/2015CPSC503 Winter 200938 Next Time Discourse and Dialog : Overview of Chapters 21 and 24

39 5/16/201539 WordSim: Thesaurus Methods(Extended Lesk) For each n-word phrase that occurs in both glosses, Extended Lesk adds in a score n 2 CPSC503 Winter 2009

40 5/16/2015CPSC503 Winter 200940 WS: Thesaurus Methods(1) Path-length based sim on hyper/hypo hierarchies Information content word similarity (not all edges are equal) probability Information Lowest Common Subsumer


Download ppt "5/16/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini."

Similar presentations


Ads by Google