Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick’s measures of word Similarity; coverage of Jiang and Conrath, 1997) Pushpak.

Similar presentations


Presentation on theme: "CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick’s measures of word Similarity; coverage of Jiang and Conrath, 1997) Pushpak."— Presentation transcript:

1 CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick’s measures of word Similarity; coverage of Jiang and Conrath, 1997) Pushpak Bhattacharyya CSE Dept., IIT Bombay

2 Path length based similarity between house and lock House belongs-to 12 senses Sense-1 House study wall doorway door lock Has-part

3 Properties that a Path Length based measure should satisfy Zero property: – self distance is 0 (d(A,A)=0) Symmetric property: – d(A,B)=d(B,A) Positive property: – d is always non-negative, and Triangular inequality: – d(A,C) <= d(A,B)+d(B,C).

4 Motivating Resnick’s measure: through hypernymy (is-a) hierarchy Sense 1 lock -- (a fastener fitted to a door or drawer to keep it firmly closed) => fastener, fastening, holdfast, fixing -- (restraint that attaches to something or holds something in place) => restraint, constraint -- (a device that retards something's motion; "the car did not have proper restraints fitted") => device -- (an instrumentality invented for a particular purpose; "the device is small enough to wear on your wrist"; "a device intended to conserve water") => instrumentality, instrumentation -- (an artifact (or system of artifacts) that is instrumental in accomplishing some end) => artifact, artefact -- (a man-made object taken as a whole) => whole, unit -- (an assemblage of parts that is regarded as a single entity; "how big is that part compared to the whole?"; "the team is a unit") => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects") => physical entity -- (an entity that has physical existence) => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

5 House: sense 1 house -- (a dwelling that serves as living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house") => dwelling, home, domicile, abode, habitation, dwelling house -- (housing that someone is living in; "he built a modest dwelling near the pond"; "they raise money to provide homes for the homeless") => housing, lodging, living accommodations -- (structures collectively in which people are housed) => structure, construction -- (a thing constructed; a complex entity constructed of many parts; "the structure consisted of a series of arches"; "she wore her hair in an amazing construction of whirls and ribbons") => artifact, artefact -- (a man-made object taken as a whole) => whole, unit -- (an assemblage of parts that is regarded as a single entity; "how big is that part compared to the whole?"; "the team is a unit") => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects") => physical entity -- (an entity that has physical existence) => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving)) Overlap

6 House: sense 2 Sense 2 house -- (an official assembly having legislative powers; "a bicameral legislature has two houses") => legislature, legislative assembly, legislative, general assembly, law-makers -- (persons who make or amend or repeal laws) => assembly -- (a group of persons gathered together for a common purpose) => gathering, assemblage -- (a group of persons together in one place) => social group -- (people sharing some social relation) => group, grouping -- (any number of entities (members) considered as a unit) => abstraction -- (a general concept formed by extracting common features from specific examples) => abstract entity -- (an entity that exists only abstractly) => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

7 House: sense 11 Sense 11 sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided) => region, part -- (the extended spatial location of something; "the farming regions of France"; "religions in all parts of the world"; "regions of outer space") => location -- (a point or extent in space) => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects") => physical entity -- (an entity that has physical existence) => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving)) Overlap

8 Measures of Semantic Relatedness: Resnick The Resnik Measure – Information content based relatedness measure – Higher information content specific to particular topics, lower ones specific to more general topics Carving fork – HIGH IC, entity – LOW IC – The Idea is that two concepts are semantically related proportional to the amount of information shared

9 Sense marked corpora: semcor He succeeds Buck_Shaw, who retired at the end of last season.

10 Measures of Semantic Relatedness – Considers position of nouns in is-a hierarchy – SR is determined by information content of lowest common concept which subsumes both concept – For example: Nickel and Dime subsumed by Coin, Nickel and Credit card by Medium of Exchange – P(c) is probability of encountering concept c. – If a is-a b, then p(a) <= p(b) – Information content calculated by formula:- IC (concept) = – log (P (concept))

11 Measures of Semantic Relatedness – Thus relatedness is given by:- Sim res (c 1, c 2 ) = IC (LCS (c 1, c 2 )) – Does not consider information content of the concepts themselves nor path length – Problems faced is that many concepts might have the same subsumer thus having same score – May get high measures on the basis of some inappropriate word senses. E.g tobacco and horse – Newer methods such as Jiang-Conrath, Lin and Leacock-Chodorow measures

12 In case of multiple senses where sen(w) denotes the set of possible senses for word w.

13 Relevant formulae Classes(W) is no. of senses the word has; Words(c) is the set of words subsumed (directly or indirectly) by the class c

14 Example of Resnick Similarity in action

15 Structural Characteristics of a hierarchical n/w Local network density (the number of child links that span out from a parent node) – In the plant/flora section of WordNet, the hierarchy is very dense Depth of a node in the hierarchy – distance shrinks as one descends the hierarchy, since differentiation is based on finer and finer details Type of link The strength of an edge link: corpus statistics has to play role; theoretical soundness and computational efficiency are needed

16 Link Strength: Probability and IC theoretic The strength of a child link is proportional to the conditional probability of encountering an instance of the child concept c i given an instance of its parent concept p: P(c i | p)

17 Link strength Intuition Actual formula Formulation

18 What does all this buy us?

19 Correlations

20 Page Rank Developed by Larry Page and Sergei Brinn Link analysis algorithm assigns numerical weighting to hyperlinked set of documents Measures relative importance of page in a set Link to a page is a vote of support which increases the rank of that particular page It is a probability distribution representing the likelihood of a person randomly clicking ultimately ending up on a specific page

21 Pagerank based Algorithm Assume universe has 4 pages A, B, C and D Initial values of all the pages is 0.25 Now suppose B, C and D link only to A Rank of A given by:- If B links to other pages also then rank of A:- L(B) is the number of outbound links from B

22 Pagerank based Algorithm (contd.) Page rank of U depends on rank of page V linking to U divided by number of links from V Page Rank can be given by general formula:- Formula applicable for pages which link to U Thus we can see that the page ranks of all pages in corpus will be equal to 1

23 Pagerank based Algorithm (contd.) Damping Factor : Imaginary surfer will stop clicking at links after some time. d is probability that user will continue clicking Damping factor is estimated at 0.85 here The new page rank formula using this is:- Now to get actual rank of a page we will have to iterate this formula many times Problem of Dangling Links


Download ppt "CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick’s measures of word Similarity; coverage of Jiang and Conrath, 1997) Pushpak."

Similar presentations


Ads by Google