Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.

Slides:



Advertisements
Similar presentations
Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
Advertisements

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Improved TF-IDF Ranker
Analysis and Modeling of Social Networks Foudalis Ilias.
Todays topic Social Tagging By Christoffer Hirsimaa.
Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata.
 Users annotate things (resources) with labels (tags)  These annotations are shared, creating a collaborative dataset called a folksonomy coffee java.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Short Text Understanding Through Lexical-Semantic Analysis
A hybrid method for Mining Concepts from text CSCE 566 semester project.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Andriy Shepitsen, Jonathan Gemmell, Bamshad Mobasher, and Robin Burke
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Query Operations Relevance Feedback & Query Expansion.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Christian Körner 1, Dominik Benz 2, Andreas Hotho 3, Markus Strohmaier 1, Gerd Stumme 2 Stop thinking, start tagging: Tag Semantics arise from Collaborative.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
Algorithmic Detection of Semantic Similarity WWW 2005.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
Evgeniy Gabrilovich and Shaul Markovitch
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 IDB Lab Seminar.
Motivation  Methods of local analysis extract information from local set of documents retrieved to expand the query  An alternative is to expand the.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Houses of Mirrors: Deeply Adaptive Designs for Machine Cognition Deborah Duong, Michael Ross.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Of 24 lecture 11: ontology – mediation, merging & aligning.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Queensland University of Technology
Neighborhood - based Tag Prediction
WordNet: A Lexical Database for English
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Anastasia Baryshnikova  Cell Systems 
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January 11, 2011

Outline  Introduction  Folksonomy Definition and Data  Measures of Relatedness  Qualitative Insights  Semantic Grounding  Discussion and Perspectives 2

Introduction  Folksonomy –Underlying data structure of the social bookmarking systems –Consist of a set of users, a set of tags, a set of resources, & a set of tag assignments –Research communities interested in extracting machine-processable semantic structures from them  Focus on similarity and relatedness of tags –They carry the semantic information within a folksonomy –Provide the link to ontologies 3

Introduction  Define similarity and relatedness for a folksonomy –1. Map the tags to a thesaurus or lexicon like Roget’s thesaurus or WorkNet & measure the relatedness there by means of well-known metrics –2. Define measures of relatedness directly on the network structure of the folksonomy  Vocabulary of folksonomies includes many community-specific terms  Measures of tag relatedness in a folksonomy –Use statistical information about different types of co-occurrence –Adopt the distributional hypothesis 4

Folksonomy Definition and Data  Del.icio.us (Nov 2006) –Restricted dataset to the 10,000 most frequent tags & resources/users that have been associated with at least one of those tags –|U| = 476,378 users |T| = 10,000 tags |R| = 12,660,470 resources |Y| = 101,491,722 tag assignments 5

Measures of Relatedness  Five measures of tag relatedness –Co-occurrence count –Three distributional measures  Use the cosine similarity in the vector spaces spanned by U, T, & R –FolkRank 6

Measures of Relatedness Co-occurrence  Tag-tag co-occurrence graph –Weighted undirected graph  Weight of the edge –The number of posts that contain both t 1 and t 2 –Computation  Create a sorted list of all tag pairs which occur together in a post  Group this list by each tag and sort by count Ontologies are us: A unified model of social networks and semantics, Peter Mika, Journal of Web Semantics,

Measures of Relatedness Distributional Measures  Vector space R X –X ∈ {U, T, R} –Each tag t is represented by a vector v t ∈ R X  Tag Context Similarity (TagCont) –Computed in the vector space R T –Entries of the vector v t ∈ R T are defined by v tt’ := w(t, t’)  v tt = 0  two tags to be considered related when they occur in a similar context, and not when they occur together 8

Measures of Relatedness Distributional Measures  Resource Context Similarity (ResCont) –Computed in the vector space R R –The vector v t ∈ R R  Counting how often a tag t is used to annotate a certain resource r ∈ R v tr := card{ u ∈ U | (u, t, r) ∈ Y }  User Context Similarity (UserCont) –Computed in the vector space R U –The vector v t ∈ R U  Counting how often a tag t is used to annotate a certain resource r ∈ R v tu := card{ r ∈ R | (u, t, r) ∈ Y }  Cosine similarity cossim(t 1, t 2 ) = –Ranges from 0 to 1 9

Measures of Relatedness FolkRank  PageRank algorithm –A web page is important if there are many pages linking to it, and if those pages are important themselves  FolkRank –A graph-based measure –An adaptation of PageRank to folksonomies –A resource which is tagged with important tags by important users becomes important itself 10

Qualitative Insights  A few examples of the related tags 11

Qualitative Insights  A few examples of the related tags (cont’d) –In many cases TagCont & ResCont probide more synonyms than the others  web2.0 – its alternative spellings games – semantically similar tags tobuy – equivalent functional tags –java – python, perl and c++ (TagCont): considered as siblings  TagCont is measuring the frequency of co-occurrence with other tags in the global context of the folksonomy, whereas the co-occurrence measure and FolkRank measure the frequency of co-occurrence with other tags in the same posts –TagCont & ResCont seem to yield equivalent results, especially in terms of synonym identification 12

Qualitative Insights 13

Qualitative Insights  Whether the most closely related tags are shared across measures of relatedness –UserCont & TagCont does not exhibit a strong similarity  2.66 tags w/ ResCont: can be attributed to shared synonym tags –FolkRank --- co-occurrence (6.81)  Given a tag t, its related tags according to FolkRank are tags with a high frequency of co-occurrence with t 14

Qualitative Insights  Average rank of the related tags as a function of the rank of the original tag –Co-occ & FolkRank  Most of the related tags are high-frequency tags, independently of the original tag –Context measures  Tags obtained from context relatedness span a broader range of ranks 15

Semantic Grounding  Ground measures of tag relatedness by using WordNet –In WordNet words are grouped into synsets  Several factors limit the WordNet coverage of del.icio.us tags –WordNet only covers the English language  Del.icio.us contains tags from different languages, tags that are not words at all, and is an open-ended system –Structure of WordNet itself  Cannot compute adjective using is-a network for nouns and verbs 16

Semantic Grounding  An assessment of the measure of relatedness –can be carried out by measuring – in WordNet – the average semantic distance between a tag and the corresponding most closely related tag  1. Loop over the tags that are both in del.icio.us and WordNet  2. For each of those tags we use the chosen measure to find the corresponding most related tag  3. If the most related tag is also in WordNet, measure the semantic distance between the synsets 17

Semantic Grounding  The normalized distribution P(n) of shortest-path lengths n connecting a tag to its closest related tag in WordNet –TagCont & ResCont: synonyms or siblings of the original tag –Other measures: more general tags –Higher value at n=2  sibling relation? 18

Semantic Grounding  Average edge type composition 19

Semantic Grounding  Measure for every path the hierarchical displacement ∆l in WordNet – difference in hierarchical depth between the synsets 20

Discussion and Perspectives  Contributions –1. introduces a systematic methodology for characterizing measures of tag relatedness in a folksonomy –2. the question of emergent semantics  Globally meaningful tag relations can be harvested from an aggregated and uncontrolled folksonomy vocabulary  Specifically, the measures based on TagCont & ResCont are capable of identifying tags belonging to a common semantic concepts 21

Discussion and Perspectives  The relatedness measures is best for –Synonym discovery  TagCont & ResCont deliver not only spelling variants, but also terms that belong to the same WordNet synset –Concept hierarchy  FolkRank & co-occurrence seem to yield more general tags  provide valuable input for algorithms to extract taxonomic relationships between tags –Tag recommendations  FolkRank & co-occurrence  FolkRank delivered superior and more personalized results than co-occ –Query expansion  TagCont & ResCont could be used to discover synonyms  Expand original tag query using the tags obtained by TagCont & ResCont 22