ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.

Slides:



Advertisements
Similar presentations
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Advertisements

1 Autocompletion for Mashups Ohad Greenshpan, Tova Milo, Neoklis Polyzotis Tel-Aviv University UCSC.
Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Commentary-based Video Categorization and Concept Discovery By Janice Leung.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval Presenter: Andy Lim.
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
1 Folksonomy-Based Collabulary Learning Leandro Balby Marinho, Krisztian Buza, Lars Schmidt-Thieme
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.
Web 2.0: Concepts and Applications 4 Organizing Information.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.
Crowd-Augmented Social Aware Search Soudip Roy Chowdhury & Bogdan Cautis.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
Reynold Cheng†, Eric Lo‡, Xuan S
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Review Analysis WWW2012 Weinan Zhang 29 Feb
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Question Answering over Implicitly Structured Web Content
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search Evica Milchevski , Avishek Anand ★ and Sebastian Michel.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Thesis Proposal: Prediction of popular social annotations Abon.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Flickr Tag Recommendation based on Collective Knowledge Hyunwoo Kim SNU IDB Lab. August 27, 2008 Borkur Sigurbjornsson, Roelof van Zwol Yahoo! Research.
Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia-Molina Department of Computer Science Stanford University SIGIR 2008 Presentation.
Neighborhood - based Tag Prediction
Optimizing Parallel Algorithms for All Pairs Similarity Search
Kyriaki Dimitriadou, Brandeis University
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Disambiguation Algorithm for People Search on the Web
Presentation transcript:

ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University of Hong Kong

Outline 2  Introduction  Problem Definition & Solution  Experiments  Conclusions & Future Work

Collaborative Tagging Systems 3  Example:  Delicious, Flickr  Users / Taggers  Resources  Webpages  Photos  Tags  Descriptive keywords  Post  Non-empty set of tags

Applications with Tag Data 4  Search [1][2]  Recommendation [3]  Clustering [4]  Concept Space Learning [5] [1] Optimizing web search using social annotations. S. Bao et al. WWW’07 [2] Can social bookmarking improve web search? P. Heymann et al. WSDM’08 [3] Structured approach to query recommendation with social annotation data. J. Guo CIKM’10 [4] Clustering the tagged web. D. Ramage et al. WSDM’09 [5] Exploring the value of folksonomies for creating semantic metadata. H. S. Al-Khalifa IJWSIS’07

Problem of Collaborative Tagging 5  Most posts are given to small number of highly popular resources [6] Analyzing Social Bookmarking Systems: A del.icio.us Cookbook. ECAI Mining Social Data Workshop  dataset from delicious [6]  All 30m urls  Over 10m urls are just tagged once Under-Tagging  39% posts vs. 1% urls Over-Tagging

Under-Tagging 6  Resources with very few posts have low quality tag data  Low quality of one single post  Irrelevant to the resource {3dmax}  Not cover all the aspects {geography, education}  Don’t know which tag is more important {maps, education} Improve tag data quality for under-tagged resource by giving it sufficient number of posts

Having a sufficient No. of Posts 7  All aspects of the resource will be covered  Relative occurrence frequency of tag t can reflect its importance  Irrelevant Tags rarely appear  Important tags occur frequently Can we always improve tag data quality by giving more posts to a resource?

Over-Tagging 8  Relative Frequency vs. no. of posts  >=250, stable Tagging Efforts are Wasted !

Incentive-Based Tagging 9  Guide users’ tagging effort  Reward users for annotating under-tagged resources  Reduce the number of under-tagged resources  Save the tagging efforts wasted in over-tagged resources

Incentive-Based Tagging (cont’d) 10  Limited Budget  Incentive Allocation  Objective: Maximize Quality Improvement Selected Resource Quality Metric for Tag Data Quality Metric for Tag Data

Effect of Incentive-Based Tagging 11  Top-10 Most Similar Query  5,000 tagged resources   Simulation for Physics Experiments  Implemented in Java

Related Work 12  Tag Recommendation [7][8][9]  Automatically assign tags to resources  Differences: Machine-Learning Based Methods Human Labor [7] Social Tag Prediction. P. Heymann, SIGIR’08 [8] Latent Dirichlet Allocation for Tag Recommendation, R. Krestel, RecSys’09 [9] Learning Optimal Ranking with Tensor Factorization for Tag Recommendation, S. Rendle, KDD’09

Related Work (Cont’d) 13  Data Cleaning under Limited Budget [10]  Similarity: Improve Data Quality with Human Labor  Opposite Directions: “-” Remove Uncertainty “+” Enrich Information [10] Explore or Exploit? Effective Strategies for Disambiguating Large Databases. R. Cheng VLDB’10

Outline 14  Introduction  Problem Definition & Solution  Experiments  Conclusions & Future Work

Data Model 15  Set of Resources  For a specific r i  Post: a set of tags  Post Sequence {p i (k)}  Relative Frequency Distribution (rfd) After r i has k posts {maps, education} {geography, education} {3dmax}

Quality Model: Tagging Stability 16  Stability of rfd  Average Similarity between ω rfds’, i.e., (k- ω+1)-th, …, k-th rfd  Stable point  Threshold  Stable rfd

Quality 17  For one resource r i with k posts  Similarity between its current rfd and its stable rfd  For a set of resources R  Average quality of all the resources

Incentive-Based Tagging 18  Input  A set of resources  Initial posts  Budget  Output  Incentive assignment  how many new posts should r i get  Objective  Maximize quality r1r1 r2r2 r3r3 Current Time time

Incentive-Based Tagging (cont’d) 19  Optimal Solution  Dynamic Programming  Best Quality Improvement  Assumption: know the stable rfd & posts in the future r1r1 r2r2 r3r3 time Current Time

Strategy Framework 20

Implementing CHOOSE() 21  Free Choice (FC)  Users freely decide which resource they want to tag.  Round Robin (RR)  The resources have even chance to get posts.

Implementing CHOOSE() 22  Fewest Post First (FP)  Prioritize Under-Tagged Resources  Most Unstable First (MU)  Resources with unstable rfds’ need more posts  Window size  Hybrid (FP-MU) r1r1 r2r2 r3r3 time

Outline 23  Introduction  Problem Definition & Solution  Experiments  Conclusion & Future Work

Setup 24  Delicious dataset during year 2007  5000 resources  Passed their stable point  Know the entire post sequence  Simulation from Feb  148,471 Posts in total  7% passed stable point  25% under-tagged (# of Posts < 10) r1r1 r2r2 r3r3 time Simulation Start

Quality vs. Budget 25  FP & FP-MU are close to optimal  FC does NOT increase the quality  Budget = 1,000  0.7% more posts comparing with initial no.  6.7% quality improvement  Make all resources reach stable point  FC: over 2 million more posts  FP & FP-MU: 90% saved

Over-Tagging 26  Free Choice: 50% posts are over-tagging, wasted  FP, MU and FP-MU: 0%

Top-10 Similar Sites (Cont’d) 27  On Feb   3 posts  Top-10 all java related  10,000 more posts by FC  get 4 more posts  4/10 physics related

Top-10 Similar Sites (Cont’d) 28  On Dec  270 Posts  Top-10 all physics related  Perfect Result  10,000 more posts by FP  get 11 more posts  Top 9 physics related  9 included in Perfect Result  Top 6 same order with Perfect Result

Conclusion 29  Define Tag Data Quality  Problem of Incentive-Based Tagging  Effective Solutions  Improve Data Quality  Improve Quality of Application Results E.g. Top-k search

Future Work 30  Different costs of tagging operation  User preference in allocation process  System development

References 31  [1] Optimizing web search using social annotations. S. Bao et al. WWW’07  [2] Can social bookmarking improve web search? P. Heymann et al. WSDM’08  [3] Structured approach to query recommendation with social annotation data. J. Guo CIKM’10  [4] Clustering the tagged web. D. Ramage et al. WSDM’09  [5] Exploring the value of folksonomies for creating semantic metadata. H. S. Al- Khalifa IJWSIS’07  [6] Analyzing Social Bookmarking Systems: A del.icio.us Cookbook. ECAI Mining Social Data Workshop  [7] Social Tag Prediction. P. Heymann, SIGIR’08  [8] Latent Dirichlet Allocation for Tag Recommendation, R. Krestel, RecSys’09  [9] Learning Optimal Ranking with Tensor Factorization for Tag Recommendation, S. Rendle, KDD’09  [10] Explore or Exploit? Effective Strategies for Disambiguating Large Databases. R. Cheng VLDB’10

Thank you! Contact Info: Xuan Shawn Yang University of Hong Kong 32

Effectiveness of Quality Metric (Backup) 33  All-Pair Similarity  Represent each resource by their tags  Calculate the similarity between all pairs of resources  Compare the similarity result with gold standard

Under-Tagged Resources (Backup) 34

Other Top-10 Similar Sites (Backup) 35

Problem of Collaborative Tagging (Backup) 36  Most posts are given to small number of highly popular resources  dataset from delicious.com  All 30m urls  39% posts vs. top 1% urls  Over 10m urls are just tagged once  Selected 5000 resources  High Quality Resources  7% passed stable points 50% over-tagging posts  25% under-tagged (< 10 posts)

Tagging Stability (Backup) 37  Example  Window size  Threshold  Stable Point: 100  Stable rfd: