Lecture 24 Distributiona l based Similarity II Topics Distributional based word similarityReadings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April 10, 2013 CSCE 771 Natural Language Processing
– 2 – CSCE 771 Spring 2013 Overview Last Time (Programming) Examples of thesaurus based word similarity path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin extended Lesk – glosses of words need to include hypernymsToday Distributional methodsReadings: Text 19,20 NLTK Book: Chapter 10 Next Time: Distributiona l based Similarity II
– 3 – CSCE 771 Spring 2013 Figure 20.8 Summary of Thesaurus Similarity measures Elderly moment IS-A memory fault IS-A mistake sim-path correct in table
– 4 – CSCE 771 Spring 2013 Example computing PPMI Need counts so lets make up someNeed counts so lets make up some we need to edit this table to have counts
– 5 – CSCE 771 Spring 2013 Associations PMI-assoc assoc PMI (w, f) = log 2 P(w,f) / P(w) P(f)assoc PMI (w, f) = log 2 P(w,f) / P(w) P(f) Lin- assoc - f composed of r (relation) and w’ assoc LIN (w, f) = log 2 P(w,f) / P(r|w) P(w’|w)assoc LIN (w, f) = log 2 P(w,f) / P(r|w) P(w’|w) t-test_assoc (20.41)
– 6 – CSCE 771 Spring 2013 Figure Co-occurrence vectors Dependency based parser – special case of shallow parsing identify from “I discovered dried tangerines.” (20.32) discover(subject I)I(subject-of discover) tangerine(obj-of discover)tangerine(adj-mod dried)
– 7 – CSCE 771 Spring 2013 Figure Objects of the verb drink Hindle 1990
– 8 – CSCE 771 Spring 2013 vectors review dot-productlengthsim-cosine
– 9 – CSCE 771 Spring 2013 Figure Similarity of Vectors
– 10 – CSCE 771 Spring 2013 Fig Vector Similarity Summary
– 11 – CSCE 771 Spring 2013 Figure Hand-built patterns for hypernyms Hearst 1992
– 12 – CSCE 771 Spring 2013 Figure 20.15
– 13 – CSCE 771 Spring 2013 Figure 20.16
– 14 – CSCE 771 Spring how to do in nltk NLTK 3.0a1 released : February 2013 This version adds support for NLTK’s graphical user interfaces. This version adds support for NLTK’s graphical user interfaces. which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text.