Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 24 Distributiona l based Similarity II Topics Distributional based word similarityReadings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April.

Similar presentations


Presentation on theme: "Lecture 24 Distributiona l based Similarity II Topics Distributional based word similarityReadings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April."— Presentation transcript:

1 Lecture 24 Distributiona l based Similarity II Topics Distributional based word similarityReadings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April 10, 2013 CSCE 771 Natural Language Processing

2 – 2 – CSCE 771 Spring 2013 Overview Last Time (Programming) Examples of thesaurus based word similarity path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin extended Lesk – glosses of words need to include hypernymsToday Distributional methodsReadings: Text 19,20 NLTK Book: Chapter 10 Next Time: Distributiona l based Similarity II

3 – 3 – CSCE 771 Spring 2013 Figure 20.8 Summary of Thesaurus Similarity measures Elderly moment  IS-A  memory fault  IS-A  mistake sim-path correct in table

4 – 4 – CSCE 771 Spring 2013 Example computing PPMI Need counts so lets make up someNeed counts so lets make up some we need to edit this table to have counts

5 – 5 – CSCE 771 Spring 2013 Associations PMI-assoc assoc PMI (w, f) = log 2 P(w,f) / P(w) P(f)assoc PMI (w, f) = log 2 P(w,f) / P(w) P(f) Lin- assoc - f composed of r (relation) and w’ assoc LIN (w, f) = log 2 P(w,f) / P(r|w) P(w’|w)assoc LIN (w, f) = log 2 P(w,f) / P(r|w) P(w’|w) t-test_assoc (20.41)

6 – 6 – CSCE 771 Spring 2013 Figure 20.10 Co-occurrence vectors  Dependency based parser – special case of shallow parsing  identify from “I discovered dried tangerines.” (20.32)  discover(subject I)I(subject-of discover)  tangerine(obj-of discover)tangerine(adj-mod dried)

7 – 7 – CSCE 771 Spring 2013 Figure 20.11 Objects of the verb drink Hindle 1990

8 – 8 – CSCE 771 Spring 2013 vectors review dot-productlengthsim-cosine

9 – 9 – CSCE 771 Spring 2013 Figure 20.12 Similarity of Vectors

10 – 10 – CSCE 771 Spring 2013 Fig 20.13 Vector Similarity Summary

11 – 11 – CSCE 771 Spring 2013 Figure 20.14 Hand-built patterns for hypernyms Hearst 1992

12 – 12 – CSCE 771 Spring 2013 Figure 20.15

13 – 13 – CSCE 771 Spring 2013 Figure 20.16

14 – 14 – CSCE 771 Spring 2013 http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltk NLTK 3.0a1 released : February 2013 This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/ This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/ which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguisticshttp://en.wikipedia.org/wiki/Portal:Linguisticshttp://en.wikipedia.org/wiki/Yarowsky_algorithmhttp://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html


Download ppt "Lecture 24 Distributiona l based Similarity II Topics Distributional based word similarityReadings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April."

Similar presentations


Ads by Google