Content Reuse and Interest Sharing in Tagging Communities

Slides:



Advertisements
Similar presentations
Overarching Question Who does the thinking? Therefore, who does the learning and growing?
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Journal Citation Reports on the Web Don Sechler Customer Education – Science and Scholarly Research
Elizeu Santos-Neto, Flavio Figueiredo Jussara Almeida, Miranda Mowbray Marcos Gonçalves, Matei Ripeanu The 2 nd IEEE SocialCom/SIN -- August 2010.
Bamshad Mobasher Center for Web Intelligence School of Computing, DePaul University, Chicago, Illinois, USA.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
The Power of Indirect Ties in Friend-to-Friend Storage Systems Xiang Zuo 1, Jeremy Blackburn 2, Nicolas Kourtellis 3, John Skvoretz 1 and Adriana Iamnitchi.
Small-World File-Sharing Communities Adriana Iamnitchi, Matei Ripeanu and Ian Foster,
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
1 Individual and Social Behavior in Tagging Systems Elizeu Santos-Neto David Condon, Nazareno Andrade Adriana Iamnitchi, Matei Ripeanu 20th ACM International.
Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.
Learning Bit by Bit Collaborative Filtering/Recommendation Systems.
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.
Computing Trust in Social Networks
Tracking User Attention in Collaborative Tagging Communities Elizeu Santos-Neto Matei Ripeanu Univesity of British Columbia Adriana Iamnitchi University.
Tuple – InfoVis Publication Browser CS533 Project Presentation by Alex Gukov.
The Influence of Indirect Ties on Social Network Dynamics Xiang Zuo 1, Jeremy Blackburn 2, Nicolas Kourtellis 3, John Skvoretz 1 and Adriana Iamnitchi.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
VIRTUAL BUSINESS RETAILING
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Web 2.0: Concepts and Applications 4 Organizing Information.
Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
1 presentation of article: Small-World File-Sharing Communities Article: Adriana Iamnitchi, Matei Ripeanu, Ian Foster Presentation: Periklis Akritidis.
Make observations to state the problem *a statement that defines the topic of the experiments and identifies the relationship between the two variables.
Google News Personalization: Scalable Online Collaborative Filtering
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering.
Developing Trust Networks based on User Tagging Information for Recommendation Making Touhid Bhuiyan et al. WISE May 2012 SNU IDB Lab. Hyunwoo Kim.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
Finding & Using Standard Deviation. Entry Task What trends do you see in your experimental results? How confident are you in your data? (very confident,
Vulnerability in Socially-informed Peer-to-Peer Systems Jeremy Blackburn Nicolas Kourtellis Adriana Iamnitchi University of South Florida.
Analyzing Data in MWADC. Outline What is Ravian and what is the Analyst application? Accessing the Analyst application What you can do with the Analyst.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 © 2010 Pearson Education, Inc. All rights reserved © 2010 Pearson Education, Inc. All rights reserved Chapter 2 Graphs and Functions.
EBI is an Outstation of the European Molecular Biology Laboratory. Literature Resources at the EBI Information Workshop on European Bioinformatics Resources.
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Web Spam Taxonomy Zoltán Gyöngyi, Hector Garcia-Molina Stanford Digital Library Technologies Project, 2004 presented by Lorenzo Marcon 1/25.
Gross Niv Analyzing Spammer’s Social Networks for Fun and Profit
Uncovering Social Spammers: Social Honeypots + Machine Learning
Feeds That Matter A study of Bloglines subscriptions
Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu
Postdoc, School of Information, University of Arizona
Inference Integrity in Wireless Networks
THE STAGES FOR STATISTICAL THINKING ARE:
An Introducation to ResearcherID
9.2 Representing Linear Functions
THE STAGES FOR STATISTICAL THINKING ARE:
Item-to-Item Recommender Network Optimization Methodology
Graph and Link Mining.
Ass. Prof. Dr. Mogeeb Mosleh
What is a Web log (blog)? Skills: reading and searching a blog
GhostLink: Latent Network Inference for Influence-aware Recommendation
ICOM TC Charter TC’s Scope Out of TC’s Scope Call for Participation
Causal Comparative Research Design
Presentation transcript:

Content Reuse and Interest Sharing in Tagging Communities Elizeu Santos-Neto Matei Ripeanu Univesity of British Columbia Adriana Iamnitchi University of South Florida

Social Information Processing Motivation There is a growing interest in leveraging collective behavior in tagging communities e.g., recommendation, spam detection To date, no quantitative study available that… estimates collaboration levels in tagging communities evaluates the impact of observed levels on applications Our finding: collaboration levels are low! AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Tagging Communities Users collect items and annotate them with tags Items can be URLs, photos, citation records, blog posts, etc… AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Example - CiteULike Tags Item User Other Users AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Goals Assess the levels of collaboration Define metrics Analyze real communities (CiteULike and Connotea) Discuss the impact of collaboration levels on Recommendation systems Detection of malicious behavior (e.g. tag spam) AAAI Spring Symposium 2008 Social Information Processing

Metrics to assess collaboration Content Reuse Percentage of activity that refer to existing items (or tags) Interest Sharing The level of overlapping between the set of items (or tags) of two users AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Data Sets CiteULike Connotea Users ~21K ~10K Items (unique) ~625K ~267K Tags (unique) ~188K ~110K Tag Assignments ~3.3M ~890K Activity trace since communities conception Traces represent more than 2 years of activity Explicit activity only (no browsing histories or click traces) Data collection CiteULike: publicly available trace Connotea: our own crawler AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Item Reuse CiteULike Connotea Add a plot with the # of tagging assignments A low percentage of daily item reuse AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing User Activity CiteULike Connotea Existing users perform the largest portion of daily activity AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Tag Reuse CiteULike Connotea A high percentage of tags is reused daily AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Interest Sharing Ana Eve Items Tags Otto AAAI Spring Symposium 2008 Social Information Processing

Interest Sharing - Definition Intuition User similarity based on their activity Metric: Jaccard Index Definitions Item-based Tag-based AAAI Spring Symposium 2008 Social Information Processing

Interest Sharing - Results CiteULike Connotea Item-based Tag-based No Interest Sharing 99% 98% Average 7.6% 13.1% 4.5% 2.5% Median 2.3% 2.2% 0.9% 1.4% Standard Deviation 16.7% 27.2% 11.2% 4.7% Interest sharing level is low for both communities Observed interest sharing values are dispersed - Percentage of ZERO INTEREST SHARING in the table above AAAI Spring Symposium 2008 Social Information Processing

Interest Sharing – Results (2) Larger labels… The interest sharing levels are concentrated around low values AAAI Spring Symposium 2008 Social Information Processing

Impact on System Design Collaboration levels are low What is the impact on systems design? Recommendation systems New item problem Data set sparsity Misbehavior detection It is harder to detect legitimate behavior AAAI Spring Symposium 2008 Social Information Processing

Social Information Processing Summary Assess collaboration levels Content Reuse and Interest Sharing Collaboration levels: lower than expected Impact on recommendation and spam detection Future Work Other formulations of similarity E.g., rare items = stronger similarity: Adamic-Adar Index Does the content type influence collaboration? Evaluate the impact on anti-spam techniques What is the role of different relationship types? AAAI Spring Symposium 2008 Social Information Processing

Questions http://netsyslab.ece.ubc.ca

Interest Sharing Structure Interest sharing graph Users are nodes Connected if their pair wise interest sharing is not zero CiteULike (21,980 nodes) Connotea (10,667 nodes) Item-based Tag-based Singleton nodes 9,737 599 5,695 859 Connected components (excluding singletons) 767 8 226 14 Nodes in the largest component 8,636 21,369 4,205 9,782 Largest component density 0.0121 0.1703 0.0131 0.0995 AAAI Spring Symposium 2008 Social Information Processing

Interest Sharing Dynamics - Results Connotea AAAI Spring Symposium 2008 Social Information Processing

Interest Sharing Over Time Item-based Tag-based AAAI Spring Symposium 2008 Social Information Processing