The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Collaborative tagging for GO Domenico Gendarmi Department of Informatics University of Bari.
Analysis and Modeling of Social Networks Foudalis Ilias.
Todays topic Social Tagging By Christoffer Hirsimaa.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Power Laws: Rich-Get-Richer Phenomena
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Flickr Tags Network Mustafa Kilavuz. Tags A tag is a keyword Search, spam detection, reputation systems, personal organization and metadata.
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
The Complex Dynamics of Collaborative Tagging. Problem Tagging distributions tend to stabilize into powerlaw distributions. empirically determine as to.
Tags, Networks, Narrative Explorations in Folksonomy Sue Thomas and Bruce Mason IOCT, De Montfort University 30 th January 2007.
Recommender Systems; Social Information Filtering.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
OPAL Conference, August Social Tagging, Folksonomies & Controlled Vocabularies Inviting New Access Systems to our Academic Table Margaret Maurer.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Yuri de Lugt Collexis Karin Clavel TU Delft Library.
Tag-based Social Interest Discovery
Tag-based Social Interest Discovery 2009/2/9 Presenter: Lin, Sin-Yan 1 Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc WWW 2008 Social Networks & Web 2.0.
Building and Analyzing Social Networks Case Studies of Semantic Social Network Analysis Dr. Bhavani Thuraisingham February 22, 2013.
Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Sharing Online Resources Social Bookmarking. Ambition in Action Facilitators Stephan Ridgway, Workforce Development.
JENNIE MATHEWS ST. JOHN’S UNIVERSITY LIS 239 Can the Addition of Social Software Tools & Tags Improve the Productivity of an Academic Library OPAC? 1.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
Personalized Interaction with Web Resources First Sino-German Symposium on KNOWLEDGE HANDLING: REPRESENTATION, MANAGEMENT AND PERSONALIZED APPLICATION.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Tag Data and Personalized Information Retrieval 1.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Social Semantic Web 林光德. Problems Why should I apply Semantic Web technologies on Social Web? What can I benefit?
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Let's play “tag”. what is a tag? A tag is a keyword or descriptive term associated with an item as means of classification by means of a folksonomy...
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Tag Clouds Presented By: Laura F. Bright February 27th, 2006 INF385T: Semantic Web Spring 2006 / Turnbull.
Web Science & Technologies University of Koblenz ▪ Landau, Germany Micro-interactions and Macro-observations Deciding Between Competing Models Steffen.
A Method for Classification of Data with Tags based on Support Vector Machine (Working Title) March 22, 2007 SNU iDB Lab. Byunggul Koh.
Thesis Proposal: Prediction of popular social annotations Abon.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Tagging Systems and Their Effect on Resource Popularity Austin Wester.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
15 Sep 2015 EunJeong Cheon i501: introduction to informatics Semiotic Dynamics and Collaborative Tagging Ciro Cattuto, Vittorio Loreto, and Luciano Pietronero.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
 GEETHA P.  Originally coined by Tim O’Reilly Publishing Media  Second generation of services available on www.  Lets people collaborate and share.
On Stability, Clarity, and Co-occurrence of Self-Tagging Aixin Sun and Anwitaman Datta Nanyang Technological University Singapore.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
ece 627 intelligent web: ontology and beyond
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Web Mining Department of Computer Science and Engg.
Presentation transcript:

The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW 2007

Introduction An issue continues to be a central concern: How metadata for web resources should be generated? An issue continues to be a central concern: How metadata for web resources should be generated? –concerned with efficiency and efficacy Social bookmarking Social bookmarking –An increasingly influential web application –del.icio.us, Flickr, Furl, Rojo, Connotea, Technorati,etc Folksonomies vs. Ontologies Folksonomies vs. Ontologies –categorization (tagging) by unsupervised users vs. classification by formal ontologies defined by experts –Multi-categories vs. exact one class

Benefits and drawbacks of collaborative tagging Benefits Benefits –higher malleability and adaptability ( “ users do not have to agree on a hierarchy of tags or detailed taxonomy ” ) –Enable retrieving and sharing data more efficiently Drawbacks Drawbacks –Ambiguity in the meaning of tags –The use of synonyms creates informational redundancy –The central concern: whether or not the system becomes relatively stable with time and use? The most problematic claim for tagging systems: The most problematic claim for tagging systems: Because users are not under a centralized controlling vocabulary, no coherent categorization scheme can emerge at all from collaborative tagging.

The Dynamics of Tagging Tag distribution Tag distribution –The collection of all tags and their frequencies ordered by rank frequency for a given resource Features of complex systems Features of complex systems –A large number of users –A lack of central coordination –Non-linear dynamics Two important features of collaborative tagging systems Two important features of collaborative tagging systems –Imitation of others –Shared knowledge

The Tripartite Structure of Tagging Figure: tripartite graph structure of a tagging system. An edge linking a user, a tag and a resource (website) represents one tagging instance Tags provide the link between the users and the resources (search  tagging [feedback] ) Tags provide the link between the users and the resources (search  tagging [feedback] )

A Generative Model Preferential attachment Preferential attachment –Known popularly as the “ rich get richer ” model –P(a) = the probability of a user committing a tagging action –P(o) = the probability that an “ old tag ” is reinforced –If an old tag x is added, it happens with the probability Preferential attachment do not explain why a particular new tag is added. Preferential attachment do not explain why a particular new tag is added. –In practice, a new tag may be added that uncovers an informational dimension not captured by older tags. –Information value: the information conveyed by the tag Linear combination: Linear combination:

An Example of Preferential Attachment Figure: an example of how shuffling leads to preferential attachment. This process produces a power law distribution.

Abstract Example of Information Value I(t 1 )=1, I(t 3 )=0, I(t 2 )> I(t 4 ), I(t 2,t 4 )=1, I(t 1,t 5 )=0 (not additive) I(t 1 )=1, I(t 3 )=0, I(t 2 )> I(t 4 ), I(t 2,t 4 )=1, I(t 1,t 5 )=0 (not additive) Following Zipf ’ s famous “ Principle of Least Effort ”, users presumably minimize the number of tags used. Following Zipf ’ s famous “ Principle of Least Effort ”, users presumably minimize the number of tags used.

Empirical Study Data set Data set –500 sites from the “ Popular ” section of del.icio.us Mean users, standard deviation of 92.9 Mean users, standard deviation of 92.9 –500 from the “ Recent ” section Mean users, standard deviation of 18.2 Mean users, standard deviation of 18.2 Power law distribution Power law distribution y = cx α y = cx α  log y = αlog x + log c

Power Law Regression for Popular Sites Figure: frequency of tag usage, based on relative position (the 25 most frequently used tags) Average α=-1.22 and standard deviation ±0.03 Average α=-1.22 and standard deviation ±0.03

Empirical Results for Popular Sites Figure: cumulative frequency of tag use, based on relative position In positions seven to ten have a considerably sharper drop In positions seven to ten have a considerably sharper drop

Regression Results for Less Popular Sites Average α=-3.9 and standard deviation ±4.63 Average α=-3.9 and standard deviation ±4.63

The Dynamics of Tag Distributions Study how the shape of these distributions forms in time from the tagging actions of individual users Study how the shape of these distributions forms in time from the tagging actions of individual users Kullback-Leibler Divergence (relative entropy) Kullback-Leibler Divergence (relative entropy) Two complementary ways to detect whether or not a distribution has converged to a steady state Two complementary ways to detect whether or not a distribution has converged to a steady state –Take the relative entropy between every two consecutive points in time of the distribution –Take the relative entropy of the tag distribution for each time point with respect to the final tag distribution

Empirical Results for Tag Dynamics (1/2) Figure: relative entropy between tag frequency distributions at consecutive time-steps

Empirical Results for Tag Dynamics (2/2) Figure: the relative entropy of the tag distribution for each time point with respect to the final distribution

Constructing Inter-Tag Correlation Graphs The information value of tags is a central aspect governing the evolution of tag distributions. The information value of tags is a central aspect governing the evolution of tag distributions. Distance between two tags Distance between two tags N(T i ) =the number of pages tagged by T i

Tag Correlation Network Figure: visualization of a tag correlation network, considering only the correlations corresponding to one central node “ complexity ”

Tag Correlation Network Figure: visualization of a tag correlation network, considering all relevant correlations ( “ small world ” structure  Zipf ’ s law)

Conclusion and Future Work This work has explored a number of issues highly relevant to the question of whether a coherent way of organizing metadata can emerge from distributive tagging systems. This work has explored a number of issues highly relevant to the question of whether a coherent way of organizing metadata can emerge from distributive tagging systems. It ’ s shown that tagging distributions tend to stabilize into power law distributions. It ’ s shown that tagging distributions tend to stabilize into power law distributions. Using an example domain, we explored one of the most empirically challenging aspects of the generative model: the information value of a tag as a function of the number of pages. Using an example domain, we explored one of the most empirically challenging aspects of the generative model: the information value of a tag as a function of the number of pages. Future work will elaborate on the results presented here regarding categorization schemes based on tag co-occurrence and information value and will examine whether these results hold among many different tagging applications. Future work will elaborate on the results presented here regarding categorization schemes based on tag co-occurrence and information value and will examine whether these results hold among many different tagging applications.