Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.

Similar presentations


Presentation on theme: "Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January."— Presentation transcript:

1 Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January 11, 2011

2 Outline  Introduction  Folksonomy Definition and Data  Measures of Relatedness  Qualitative Insights  Semantic Grounding  Discussion and Perspectives 2

3 Introduction  Folksonomy –Underlying data structure of the social bookmarking systems –Consist of a set of users, a set of tags, a set of resources, & a set of tag assignments –Research communities interested in extracting machine-processable semantic structures from them  Focus on similarity and relatedness of tags –They carry the semantic information within a folksonomy –Provide the link to ontologies 3

4 Introduction  Define similarity and relatedness for a folksonomy –1. Map the tags to a thesaurus or lexicon like Roget’s thesaurus or WorkNet & measure the relatedness there by means of well-known metrics –2. Define measures of relatedness directly on the network structure of the folksonomy  Vocabulary of folksonomies includes many community-specific terms  Measures of tag relatedness in a folksonomy –Use statistical information about different types of co-occurrence –Adopt the distributional hypothesis 4

5 Folksonomy Definition and Data  Del.icio.us (Nov 2006) –Restricted dataset to the 10,000 most frequent tags & resources/users that have been associated with at least one of those tags –|U| = 476,378 users |T| = 10,000 tags |R| = 12,660,470 resources |Y| = 101,491,722 tag assignments 5

6 Measures of Relatedness  Five measures of tag relatedness –Co-occurrence count –Three distributional measures  Use the cosine similarity in the vector spaces spanned by U, T, & R –FolkRank 6

7 Measures of Relatedness Co-occurrence  Tag-tag co-occurrence graph –Weighted undirected graph  Weight of the edge –The number of posts that contain both t 1 and t 2 –Computation  Create a sorted list of all tag pairs which occur together in a post  Group this list by each tag and sort by count Ontologies are us: A unified model of social networks and semantics, Peter Mika, Journal of Web Semantics, 2007 7

8 Measures of Relatedness Distributional Measures  Vector space R X –X ∈ {U, T, R} –Each tag t is represented by a vector v t ∈ R X  Tag Context Similarity (TagCont) –Computed in the vector space R T –Entries of the vector v t ∈ R T are defined by v tt’ := w(t, t’)  v tt = 0  two tags to be considered related when they occur in a similar context, and not when they occur together 8

9 Measures of Relatedness Distributional Measures  Resource Context Similarity (ResCont) –Computed in the vector space R R –The vector v t ∈ R R  Counting how often a tag t is used to annotate a certain resource r ∈ R v tr := card{ u ∈ U | (u, t, r) ∈ Y }  User Context Similarity (UserCont) –Computed in the vector space R U –The vector v t ∈ R U  Counting how often a tag t is used to annotate a certain resource r ∈ R v tu := card{ r ∈ R | (u, t, r) ∈ Y }  Cosine similarity cossim(t 1, t 2 ) = –Ranges from 0 to 1 9

10 Measures of Relatedness FolkRank  PageRank algorithm –A web page is important if there are many pages linking to it, and if those pages are important themselves  FolkRank –A graph-based measure –An adaptation of PageRank to folksonomies –A resource which is tagged with important tags by important users becomes important itself 10

11 Qualitative Insights  A few examples of the related tags 11

12 Qualitative Insights  A few examples of the related tags (cont’d) –In many cases TagCont & ResCont probide more synonyms than the others  web2.0 – its alternative spellings games – semantically similar tags tobuy – equivalent functional tags –java – python, perl and c++ (TagCont): considered as siblings  TagCont is measuring the frequency of co-occurrence with other tags in the global context of the folksonomy, whereas the co-occurrence measure and FolkRank measure the frequency of co-occurrence with other tags in the same posts –TagCont & ResCont seem to yield equivalent results, especially in terms of synonym identification 12

13 Qualitative Insights 13

14 Qualitative Insights  Whether the most closely related tags are shared across measures of relatedness –UserCont & TagCont does not exhibit a strong similarity  2.66 tags w/ ResCont: can be attributed to shared synonym tags –FolkRank --- co-occurrence (6.81)  Given a tag t, its related tags according to FolkRank are tags with a high frequency of co-occurrence with t 14

15 Qualitative Insights  Average rank of the related tags as a function of the rank of the original tag –Co-occ & FolkRank  Most of the related tags are high-frequency tags, independently of the original tag –Context measures  Tags obtained from context relatedness span a broader range of ranks 15

16 Semantic Grounding  Ground measures of tag relatedness by using WordNet –In WordNet words are grouped into synsets  Several factors limit the WordNet coverage of del.icio.us tags –WordNet only covers the English language  Del.icio.us contains tags from different languages, tags that are not words at all, and is an open-ended system –Structure of WordNet itself  Cannot compute adjective using is-a network for nouns and verbs 16

17 Semantic Grounding  An assessment of the measure of relatedness –can be carried out by measuring – in WordNet – the average semantic distance between a tag and the corresponding most closely related tag  1. Loop over the tags that are both in del.icio.us and WordNet  2. For each of those tags we use the chosen measure to find the corresponding most related tag  3. If the most related tag is also in WordNet, measure the semantic distance between the synsets 17

18 Semantic Grounding  The normalized distribution P(n) of shortest-path lengths n connecting a tag to its closest related tag in WordNet –TagCont & ResCont: synonyms or siblings of the original tag –Other measures: more general tags –Higher value at n=2  sibling relation? 18

19 Semantic Grounding  Average edge type composition 19

20 Semantic Grounding  Measure for every path the hierarchical displacement ∆l in WordNet – difference in hierarchical depth between the synsets 20

21 Discussion and Perspectives  Contributions –1. introduces a systematic methodology for characterizing measures of tag relatedness in a folksonomy –2. the question of emergent semantics  Globally meaningful tag relations can be harvested from an aggregated and uncontrolled folksonomy vocabulary  Specifically, the measures based on TagCont & ResCont are capable of identifying tags belonging to a common semantic concepts 21

22 Discussion and Perspectives  The relatedness measures is best for –Synonym discovery  TagCont & ResCont deliver not only spelling variants, but also terms that belong to the same WordNet synset –Concept hierarchy  FolkRank & co-occurrence seem to yield more general tags  provide valuable input for algorithms to extract taxonomic relationships between tags –Tag recommendations  FolkRank & co-occurrence  FolkRank delivered superior and more personalized results than co-occ –Query expansion  TagCont & ResCont could be used to discover synonyms  Expand original tag query using the tags obtained by TagCont & ResCont 22


Download ppt "Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January."

Similar presentations


Ads by Google