On Stability, Clarity, and Co-occurrence of Self-Tagging Aixin Sun and Anwitaman Datta Nanyang Technological University Singapore.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

An Online News Recommender System for Social Networks Department of Computer Science University of Illinois at Urbana-Champaign Manish Agrawal, Maryam.
HT06, Position Paper, Tagging, Taxonomy, Flickr, Academic Article, ToRead, Presentation Cameron Marlow, Mor Naaman, danah boyd, Marc Davis Yahoo! Research.
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
A one player game where players are asked to tag funny video clips in a given time frame. They will score points throughout the game and be entered into.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
1 Individual and Social Behavior in Tagging Systems Elizeu Santos-Neto David Condon, Nazareno Andrade Adriana Iamnitchi, Matei Ripeanu 20th ACM International.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Enterprise social bookmarking - in a community of practice in IBM, Denmark by Joachim Florentz Boye and Marianne Lykke Nielsen Royal School of Library.
Recommender Systems; Social Information Filtering.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Why We Tag and How We Tag:
Chapter 17 Ethnographic Research Gay, Mills, and Airasian
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Tag-based Social Interest Discovery 2009/2/9 Presenter: Lin, Sin-Yan 1 Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc WWW 2008 Social Networks & Web 2.0.
Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.
Web 2.0: Concepts and Applications 4 Organizing Information.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
Zhichen Xu, Yun Fu, Jianchang Mao, and Difu Su Yahoo! Inc 2821 Mission College Blvd., Santa Clara, CA {zhichen, yfu, jmao, Towards.
Andriy Shepitsen, Jonathan Gemmell, Bamshad Mobasher, and Robin Burke
Personalized Interaction with Web Resources First Sino-German Symposium on KNOWLEDGE HANDLING: REPRESENTATION, MANAGEMENT AND PERSONALIZED APPLICATION.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Marianne Lykke Nielsen September 2008 Plenum discussion: Emerging trends in tagging – and its relation to KOS Marianne Lykke Nielsen Information Interaction.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
EDU 8603 Day 6. What do the following numbers mean?
NTU Natural Language Processing Lab. 1 Investment and Attention in the Weblog Community Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Semantic Visualization What do we mean when we talk about visualization? - Understanding data - Showing the relationships between elements of data Overviews.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-interactions and Macro-observations Deciding Between Competing Models Steffen.
A measurement-driven Analysis of Information Propagation in the Flickr Social Network Meeyoung Cha Alan Mislove Krisnna P. Gummadi.
Thesis Proposal: Prediction of popular social annotations Abon.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Tagging Systems and Their Effect on Resource Popularity Austin Wester.
1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
15 Sep 2015 EunJeong Cheon i501: introduction to informatics Semiotic Dynamics and Collaborative Tagging Ciro Cattuto, Vittorio Loreto, and Luciano Pietronero.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Information Retrieval in Practice
Queensland University of Technology
Exploring Social Tagging Graph for Web Object Classification
1. Does the writer endorse or disendorse the following views?
Personalized Social Image Recommendation
APS and INSPIRE Mark Doyle May 20, 2008.
Collective Network Linkage across Heterogeneous Social Platforms
Partial Credit Scoring for Technology Enhanced Items
A Restaurant Recommendation System Based on Range and Skyline Queries
jot down your thoughts re:
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
jot down your thoughts re:
Presentation transcript:

On Stability, Clarity, and Co-occurrence of Self-Tagging Aixin Sun and Anwitaman Datta Nanyang Technological University Singapore

2 Outline  Collaborative-tagging vs. Self-tagging  Dataset overview and characteristics  Experiments  Tag Usage and Stability  Tag Clarity vs. Popularity  Tag Co-occurrence vs. Semantic distance  Conclusion  Questions/suggestions forwarded

3 Collaborative-tagging vs. Self-tagging  Collaborative tagging  A resource may be tagged by multiple users with multiple tags, e.g., del.icio.us and CiteULike  Self-tagging  A resource can only be tagged by its creator, e.g., most blog posts.  Questions  Any differences in tagging behavior?  Observations made on collaborative tagging hold in self-tagging?  When tags are used in any application (e.g., tag recommendation, classification/clustering), shall the two systems be treated differently?

4 Dataset  Overview:  Blogs listed in and hosted by blogspot.comhttp://dir.blogflux.com/  Categories: Academic – Zookeeping  Blogs: 15,244, Posts: 3.3M  Posts with tag(s): 983K  Distinct tags: 29K  Characteristics [Marlow06]

5 Tag Usage

6 Tag Dynamics  Collaborative tagging systems [Halpin07]  Tag distribution used to collaboratively annotate a particular resource became stable after certain time period  The tags that could well describe the resource are repeatedly received from multiple users.  Possible reasons [Golder06]:  Imitation of others  Shared knowledge  Self-tagging systems?  No direct interaction to influence and imitate each other  Bloggers may read each others’ posts and tags  shared background?  an implicit consensus of tag usage.

7 Tag Stability  A relatively small set of tags to annotate most blog posts

8 Tag Clarity  Question  The same tag tends to be assigned to topically-similar blog posts?  Tag clarity:  A tag receives high clarity score if all posts annotated by the tag are topically cohesive  Inspired by query clarity score in ad-hoc retrieval [Cronen- Townsend02]  The clarity score of a tag is the distance between the tag language model and the collection language model

9 Tag Clarity vs. Tag Popularity  Number of tags reduces as tag popularity increase  Clarity scores of tags decrease with popularity increase

10 Tag Clarity vs. Tag Popularity  Less popular tags have clarity scores close to those dummy tags  More popular tags have higher clarity scores than dummy tags

11 Tag Co-occurrence vs. Semantic distance  Co-occurrence  Semantic distance:  KL-divergence between the two tag language models  Question:  If tags co-occur in annotating blog posts, then their semantic distance is small?

12 Tag Co-occurrence vs. Semantic distance

13 Tag Co-occurrence vs. Semantic distance  Observations  The co-occurrence of two tags does not suggest any semantic relationship between the two tags (correlation coefficient = 0.017).  Tag pairs (e.g.,, ) is much clearer in describing posts supported by their clarity scores.  Tag pairs are likely to be semantically-orthogonal, partially consistent with [Weinberger08].  Possible reasons:  Tags are more for personal use than others’ benefit.  A blogger has a clear understanding about her post, it is not necessary for her to tag the post with many similar tags. Rather, she may tag post with tags from different perspectives.

14 Tag Clarity vs. Tag Popularity (Revisit)

15 Conclusion  A preliminary study on tags in self-tagging system  Tag dynamics  Tag clarity vs. popularity  Tag co-occurrence vs. semantic distance  Observations:  Tags are often assigned to topically similar blog posts through the notion of tag clarity.  Co-occurred tags may not necessarily be semantically-similar to each other, but are likely to be semantically-orthogonal.

16 Questions/suggestions forwarded  For resources only tagged by its owner, people will avoid redundancy, but provide different aspects for a single resource. How does this feature influence the application on such system?  Can we expect different facets can be extracted from self tagging system?  This system only allows one user to tag one resource, and allow the user to use multi-words/phrase tag. It must be a sparse linked data; and the co-occurrence of tags must be less than the free tagging system. Could we expect some differences from this point of view?

17 More questions/suggestions  How does this difference make research and applications on self-tagging system challenging?  I wonder if the convergence of tags to the final set of tags is represented primarily by the dominance of a few tags. If you omit the most common handful of topics, do the remainder converge also?  Several blogging systems separately show author tags and reader tags. It would be interesting to see the overlap between these and the effect of one another.

18 Acknowledgement  This work was supported by A*STAR Public Sector R&D, Singapore

19 References  [Cronen-Townsend02] S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR’02, pages 299–306, Tampere, Finland,  [Golder06] S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198–208,  [Halpin07] H. Halpin, V. Robu, and H. Shepherd. The complex dynamics of collaborative tagging. In Proc. of WWW’07,pages 211–220, Banff, Alberta, Canada,  [Marlow06] C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In Proc. of ACM HyperText’06, pages 31–40, Odense, Denmark,  [Weinberger08] K. Weinberger, M. Slaney, and R. van Zwol. Resolving tag ambiguity. In ACM Multimedia, Vancouver, Canada, 2008.

Thank you