Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Improved TF-IDF Ranker
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Todays topic Social Tagging By Christoffer Hirsimaa.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
Special Topics in Text Mining Manuel Montes y Gómez University of Alabama at Birmingham, Spring 2011.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Christian Körner 1, Dominik Benz 2, Andreas Hotho 3, Markus Strohmaier 1, Gerd Stumme 2 Stop thinking, start tagging: Tag Semantics arise from Collaborative.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Algorithmic Detection of Semantic Similarity WWW 2005.
Using Semantic Relatedness for Word Sense Disambiguation
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Motivation  Methods of local analysis extract information from local set of documents retrieved to expand the query  An alternative is to expand the.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Venue Recommendation: Submitting your Paper with Style Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering, Lehigh University.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Linguistic Graph Similarity for News Sentence Searching
Semantic Processing with Context Analysis
SAMT 2006.
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Sarthak Ahuja ( ) Saumya jain ( )
Navi 下一步工作的设想 郑 亮 6.6.
Text Categorization Assigning documents to a fixed set of categories
Information Networks: State of the Art
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang.
Presentation transcript:

Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury

Overview  Motivation  Measures of semantic Relatedness  Semantic Grounding of measures  Result analysis

Motivation  Folksonomy is open ended, noisy and large system  Lack of explicit semantic relation in the tag space  Lack of robust semantic grounding of existing similarity measures.  Possible applications are :  Ontology learning  Tag recommendation  Query expansion

Folksonomies and tagging  Folksonomy is a result of social annotation of shared resources.  A folksonomy is a tuple F := (U, T, R, Y)  U: the set of users  T: the set of tags  R: the set of resources  Y: a set of ternary “tagging” relation/assignment.  A post is a set of tags assigned by a user to a resource user1resource1tag1

Data under study  Del.icio.us tags for November  667,128 users (U)  2,454,546 tags (T)  18,782,132 resources (R)  140,333,714 tag assignments (Y)  The study was focused on |T| =10,000 most frequent tags and their users (|U|=476, 378),resources (|R|=12, 660, 470) and |Y | = 101, 491, 722 tag assignments.

Similarity and relatedness  Capture the emergent semantics of the folksonomy  Similarity can be considered as a special case of relatedness  There are (at least) two options for similarity metrics:  mapping into a domain where similarity is well -defined  by means of the network structure of the folksonomy

Measures of Relatedness  Co-Occurrence  Contextual (Distributional) Measures : based on three different vector space feature representation for the tag.  Tag context  Resource context  User context  Folk Rank (Graph based)

Co-Occurrence  Given a folksonomy (U,T,R,Y) a tag-tag co- occurrence graph is a weighted undirected graph whose set of nodes is the set of tags (T).  two tags are connected by an edge if both are used at least for 1 post.  The weight of this edge is given by the number of posts that contain both t 1 and t 2.  U1-{t1,t2,t3}-r1  U2-{t1,t2}-r1  U3-{t1,t2,t5}-r t1 t4 t5 t3 t2 1

Contextual measures (cosine similarity)  Three measures of tag relatedness based on three different vector space representation of tags. The elements of tag vectors are tag, users and resource weights  If two tags t1 and t2 are represented by v1, v2 their cosine similarity is defined as: cossim(t1, t2) := cos (v1, v2).  The cosine similarity is independent of the length of the vectors and normalised to avoid frequency bias.

Contextual measures (Tag context)  Tag Context Similarity. The Tag Context Similarity (TagCont) is computed in the vector space RT, where, for tag t, the entries of the vector v t are defined by w(t 1 t 2 ) where w is the co-occurrence weight defined above. t1t1 t2t2 t3t3 t t1t t2t t3t3

Contextual measures (Resource and User Context)  The vector space of tag t is computed based on how often a tag t is used to annotate certain resource r.  The user context similarity is built similarly to resource context by swapping the roles of the sets R and U. r5 t t

Folk rank  Adaptation of PageRank to folksonomy : “A resource which is tagged with important tags by important users becomes important itself”[Hotho].  FolkRank computes a ranked list of relevant tags on a random surfer vector.  It considers a folksonomy (U,T,R,Y) as an undirected graph  Initially each tag is assigned weight 1 and adjusted with iterations by spreading weights.  Tags for a given tag t 1 obtain highest FolkRank weight are considered to be the most relevant in relation to t 1.

Related tags according to various similarity measures Co- occurrence Cosine FolkRank

Result Analysis  Computed most related tags for the most frequent tags  tag and resource context similarity provide more synonyms than the other measures. For instance, for the tag web2.0 they return some of its alternative spellings such as web-2.0,web,web2.  For the tag games, the tag and resource similarity also provide tags that could be regarded as semantically similar. For instance, the morphological variations game and gaming, or corresponding words in other languages, like spiel (German), jeu (French) and juegos (Spanish).  whereas the FolkRank and co-occurrence measures provide more related general tags and categories.  An interesting observation about the tag java is that python, perl and c++ (provided by tag context similarity) could all be considered as siblings in some suitable concept hierarchy, presumably under a common parent concept like programming languages.

Result analysis Are related tags shared across relatedness measures?  related tags obtained via tag context or resource context appear to be “synonyms” or “siblings” of the original tag.  Co-occurrence and FolkRank seem to provide “more general” tags.  In terms of shared tags, the co-occurrence and FolkRank measures are most similar and overlap 6.81 tags out of 10, while cosine similarity displays little overlap with either of them.

Semantic Grounding  The strategy is to ground the relations between the original and the related tags by looking up the tags in a formal representation of word meanings.  Mapping tags into WordNet synsets allows these measures to be compared against well-studied similarity measures.  In WordNet words are grouped into synsets, sets of synonyms that represent one concept. Synsets are nodes in a network and links between synsets represent semantic relations.  Only is-a relationships are considered.  Roughly 61% of the 10,000 most frequent tags in del.icio.us are covered in WordNet.

Wordnet similarity In Wordnet semantic similarity is measured using both  taxonomic shortest-path length  Jiang-Conrath metric  combines taxonomic path length with an information-theoretic similarity measure  validated in user studies  A first assessment of the measures of relatedness is carried out by measuring – in WordNet – the average semantic distance between a tag and the corresponding most closely related tag according to each one of the relatedness measures

Wordnet similarity

Analysis  Jiang-Conrath measure has been validated in user studies [Budanitsky] so semantic distances correspond to distances cognitively perceived by human subjects.  The tag and resource context relatedness point to tags that are semantically closer according to both grounding measures.  Resource context measure is optimal but expensive  Tag context performs equally good like resource context yet computationally lighter.

Summary  First, it introduces a systematic methodology for characterizing measures of tag relatedness in a folksonomy.  Grounded several measures of tag relatedness by mapping the tags of the folksonomy to synsets in WordNet using semantic distance.  semantic characterization of similarity measures computed on a folksonomy is possible and insightful in terms of the type of relations that can be extracted  given an appropriate measure, globally meaningful tag relations can be harvested from an aggregated and uncontrolled folksonomy vocabulary.  Admittedly, in their current status, none of the measures we studied can be seen as the way to instant ontology creation but further analysis and combination of measures will help to close the gap towards the Semantic Web.  The tag or resource context similarities are clearly the first measures to choose when one would like to discover synonyms and also useful for query expansion  Both FolkRank and co-occurrence relatedness seemed to extract taxonomic relationship between tags and tag recommendations.

References  Jiang, J.J., Conrath, D.W.: Semantic Similarity based on Corpus Statistics and Lexical Taxonomy.In: Proceedings of the International Conference on Research in Computational Linguistics(ROCLING), Taiwan (1997)  Hotho, A., J¨aschke, R., Schmitz, C., Stumme, G.: Information retrieval in folksonomies: Search and ranking. In Sure, Y., Domingue, J., eds.: The Semantic Web: Research and Applications. Volume 4011 of LNAI., Heidelberg, Springer (2006) 411–426  Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1) (2006) 13–47  Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1989)  And others.....