Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vincent Schickel-Zuber - AI IJCAI’07 US provisional patent number: 60/819,290 2-Jun-15 IJCAI 2007 Conference – Hyderabad, India January 6-12, 2007.

Similar presentations


Presentation on theme: "Vincent Schickel-Zuber - AI IJCAI’07 US provisional patent number: 60/819,290 2-Jun-15 IJCAI 2007 Conference – Hyderabad, India January 6-12, 2007."— Presentation transcript:

1 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 2-Jun-15 IJCAI 2007 Conference – Hyderabad, India January 6-12, 2007 Vincent Schickel-Zuber & Boi Faltings {vincent.schickel-zuber,boi.faltings}@epfl.ch OSS: A Semantic Similarity Function based on Hierarchical Ontologies

2 2 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Presentation Layout  Introduction The Problem Review of Existing Techniques  The OSS approach Definition and Hypothesis Inference of score and its transfer Similarity Metric  Validation of the model Similarity Metric – WordNet Similarity Metric – Gene Ontology  Conclusion & Future work

3 3 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Problem Definition  General Problem Definition: Find semantic similarity in a hierarchical ontology Ontology – DAG made of IS-A relations  WordNet has ~82% of ISA relations  GeneOntology has ~87% of ISA relations  Sim metric on IS-A relations can be generalized [1] Examples (its concrete benefits to society ):  Linguistic – Word Sense disambiguation  Search – Closely related documents  Recommendation System – Finding items in ecatalog [6]  Bioinformatics – Finding similar proteins  Semantic Web,….

4 4 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Ontology  An Ontology is a directed acyclic graph where A node models a concept  Instances being the items edges represents the isa - relations (features).  Sub-concepts are distinguished by certain features  Feature are usually not made explicit Car Vehicle Transport Boat Bus On-landOn-sea <7>6 Compact SUV CityAll_terrain

5 5 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Similarity Definition  Similarity between two a,b can be defined as: sim(a,b)=1-dist(a,b); where dist(a,b) ∈ [0,1] is the distance between a,b  dist(a,b) is defined as a metric with properties: Identity : dist (a,b)=0  a=b Triangle Inequality : dist (a,c) ≤dist (a,b)+dist (b,c) Symmetry: dist(a,b)=dist(b,a) a c b

6 6 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Existing approaches - I  3 approaches: 1.Edge-Based Approach Count the number of edges between the nodes sim edge (c1,c2)=2*MAX-min[len(c1,c2)] [2] c1c1 c2c2 cici cjcj Easy to implement Acceptable accuracy in very simple taxonomy Consider the distance uniform on all the edges

7 7 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Existing approaches - II 2.Node-Based Approach - Resnik Look at Information Content(-log p(c)) of the ancestors sim resnik (c1,c2)=max[-log p(Ancestor(c1,c2))] [2] c1c1 c2c2 IC(c i )=3IC(c j )=2 It is also possible to take into account the differences sim Lin (c1,c2)= IC(Ancestor) / (IC(c1)+IC(c2)) [3] Much more accurate than the edge based approach Require the computation of the IC of each concept 3.Hybrid approaches (Node + Edge Based) Jiang [4] and Leacock [5]

8 8 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 The Score of Concept -S  Distance measure should simulate user’s behavior D(a,b) ~ amount of score S being transferred between concept a and b.  The score can be seen as a lower bound function that models how much a user likes a concept  S is a function that satisfies the assumptions: A1: S depends on the features of the concept  Items are models by a set of features A2: Each feature contributes independently to S  Eliminates the inter-dependence between features A3: unknown|disliked features make no contribution  Reflects the fact that users are risk-averse  Liking a concept liking a sub-concept Car Vehicle Transport Boat Bus On-land On-sea <7>6 City

9 9 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Similarity Definition - revisited  dist(a,b) is defined as a metric with properties: Identity : dist (a,b)=0  a=b Triangle Inequality : dist (a,c) ≤dist (a,b)+dist (b,c) Symmetry: dist(a,b)=dist(b,a)  dist(a,b) asymmetric “distance function ” D(a,b) Identity : D(a,b)=0  a=b Normalization : 0 ≤ D(a,b) ≤ 1 Triangle Inequality : D(a,c) ≤D(a,b)+D(b,c)  Intuition (A3) : more information is lost going upwards as we are removing known features than downwards

10 10 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 A-priori Score - APS  The structure of the ontology contains information Use APS(c) to capture the knowledge of concept c  If no information, assume S(c) uniform [0..1] P(S(c)): probability that the concept c is liked P(S(c)>x)=1-x;  Concepts can have n descendants Assumption A3 => P(S(c)>x)=(1-x) n+1 E(c)= ∫xf c (x)dx = 1 n+2  APS(c)= n+2 1 #descendants APS 0,5 leafs root  APS uses NO user information ONLY the ontology’s structure a,b APS

11 11 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Inference Idea – Schickel@AAAI’06 Car Vehicle Bus S(SUV)=0.8 SUV S (bus)=??? Select the best Lowest Common Ancestor lca(SUV, bus)

12 12 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Upward Inference  Going up k levels ⇒ remove k known features A1 the score depends on the features of the item K levels SUV vehicle  Removing features ⇒ S ↘ or S ↔ (S =∑ S )  S( vehicle | SUV)= α( vehicle, SUV) * S(SUV)  α ∈[0..1] is the ratio of feature in common liked  How to compute α ? α(vehicule,SUV) =#feature(vehicle) / #feature(SUV)  Does not take into account the feature distribution α(vehicule,SUV) =APS(vehicle) / APS(SUV) S(b|a) a,b APS

13 13 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Downward Inference  Going down l levels ⇒ adding l unknown features l levels bus vehicle  Adding features ⇒ S ↗ or S ↔ (S =∑ S ) S(bus|vehicle)= α S(vehicle) α ≥ 1  How to compute β? β(bus,vehicle) = APS(bus) - APS(vehicle) ⇏  S(bus|vehicle)= S(vehicle) + β(vehicle, bus)  β ∈[0..1] is ∑ features in bus not present in vehicle A3 Users are pessimistic liking some features liking others A2 Features contributes independently to the score S(b|a) a,b APS

14 14 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Transfer of score  Dist(a,b) ~ amount of score, T(a, b) being transferred between concept a and b. Should be independent of score S(a) T(a,b) #edges between a and b 1 S(b|a)T(a,b) a,b APS Less score is being transferred upwards than downwards More information is being lost going upwards a b cjcj

15 15 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Distance  Dist(a,b) is a real valued function that must satisfy: Identity : D(a,b)=0  a=b Normalization : 0 ≤ D(a,b) ≤ 1 Triangle Inequality : D(a,c) ≤D(a,b)+D(b,c) S(b|a)T(a,b)D(a|b) a,b APS T(a,b) #edges between a and b 1

16 16 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Validation – Similarity Metric - I  Performed experiment done by Resnik [6] But used WordNet version 2.0 (79’869 concepts)  Experiment Setup – Tested 6 metrics 1.Measured the similarity between 30 word pairs using the 6 similarity metrics 2.Compared it to real user’s similarity value [7] of the same pairs and in the same order. 3.Computed the correlation. EdgeLeacockResnikLinJiangOSS Correlation0.6030.8230.7930.8230.8590.911 p-value< 0.02

17 17 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Validation – Similarity Metric - II  Measured the similarity between proteins Used the Gene Ontology (20’538 concepts) Divided in three sub-parts:  Molecular function, biological process, cellular comp.  Experiment Setup – Tested 6 metrics 1.Measured alignment between proteins using BLAST 2.Transformed the alignment into similarity value 3.Computed the Mean Absolute Error. EdgeLeacokResnikLinJiangOSS MF0.4500.2340.2240.2230.2000.185 BP0.3920.2750.3140.3120.2690.259 CC0.3510.3030.2860.2920.3430.260

18 18 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 Conclusion  Introduced a new similarity metric for hierarchical ontologies based on the transfer of score Ontology can carry useful information via APS Asymmetric (going up more costly than going down) Most accurate metric on WordNet and Gene Ontology And more robust…. Require the computation of the APS of each concept Only applies to IS-A relations  Future work: Extends to other kind of relationships S(b|a)T(a,b)D(a|b) a,b APS S(b|a)T(a,b)D(a|b) a,b APS

19 19 Vincent Schickel-Zuber - AI Lab @ IJCAI’07 US provisional patent number: 60/819,290 References [1]Algorithmic Detection of Semantic Similarity. In proc of WWW 2005 Maguitman, A., Lord, P.W., Menczer, F., Roinestad H. Vespignani A, In WWW 2005. [2]Using Information content to evaluate semantic similarity. Resnik P., In IJCAI 1995. [3]An information theoretic definition of similarity. Lin, D., In proc of 1th conf. on Machine Learning, 1998. [4]Semantic Similarity based on corpus and lexical Jiang, J., and Conrath, D.W., In COLING 1997. [5]Combining local context and WordNet similarity for word sense identification. Leacock, C., Chodorow, M., In Fellbaum, pages 265-283, 1997. [6]Inferring User’s Preferences using Ontologies Schickel, V., and Faltings, B, In AAAI 06. [7] Contextual correlates of semantic similarity Miller, G.A & Charles, W.G., Language and Cognitive Processes, 6(1), 1991. Thank-you Slides: http://people.epfl.ch/vincent.schickel-zuberhttp://people.epfl.ch/vincent.schickel-zuber S(b|a)T(a,b)D(a|b) a,b APS


Download ppt "Vincent Schickel-Zuber - AI IJCAI’07 US provisional patent number: 60/819,290 2-Jun-15 IJCAI 2007 Conference – Hyderabad, India January 6-12, 2007."

Similar presentations


Ads by Google