Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

A Vector Space Model for Automatic Indexing

Chapter 5: Introduction to Information Retrieval

Introduction to Information Retrieval

Web Intelligence Text Mining, and web-related Applications

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Todays topic Social Tagging By Christoffer Hirsimaa.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.

Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.

WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.

Web Mining Research: A Survey

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Recommender systems Ram Akella November 26 th 2008.

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

Chapter 5: Information Retrieval and Web Search

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

An Effective Fuzzy Clustering Algorithm for Web Document Classification: A Case Study in Cultural Content Mining Nils Murrugarra.

APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.

Universit at Dortmund, LS VIII

Chapter 6: Information Retrieval and Web Search

Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang

1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.

Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.

Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.

Learning Phonetic Similarity for Matching Named Entity Translations and Mining New Translations Wai Lam Ruizhang Huang Pik-Shan Cheung Department of Systems.

Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang.

Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

Artificial Intelligence Techniques Internet Applications 4.

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:

A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.

Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee IPM Multilingual document mining.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

Web2.0 Services and the Management of Academic Libraries Dr. Christian Hänger Christine Krätzsch.

Information Storage and Retrieval Fall Lecture 1: Introduction and History.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

CS 430: Information Discovery

FLOSCAN: An Artificial Life Based Data Mining Algorithm

WSExpress: A QoS-Aware Search Engine for Web Services

Presentation transcript:

Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung

2 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

3 Social Bookmarking –Why? Social bookmarking services (aka folksonomy) are gaining popularity since they have the following benefits: Alleviation of efforts in Web page annotation Improvement of retrieval precision Simplification of Web page classification

How a folksonomy works? Simple A user (u i ) annotates a Web page (o j ) with a set of tags or post (T ij ). Generally represented as a set of tuples (u i, o j, T ij ) 4 interesting… Let me add some tags. GrC2011 program Granular Computing uiui ojoj T ij

5 Collaboration Semantic relatedness help improving retrieval precision Social tagging is not a trivial task Characteristics of Folksonomy

6 the mechanism of suggesting proper tags to normal users when they try to adding tags to some Web page save the effort of users to select tags from the ground up constrain the formulation of tags Automatic tag recommendation process is thus beneficial for social bookmarking services as well as search engines. Tag Recommendation

7 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

Text Mining by SOM 8 Training Web pages Training Posts Page/tag associations Association discovery Preprocessing Web page vectors Post vectors SOM training Page clusters tag clusters Synaptic weight vectors Labeling tag associationspage associations

Preprocessing bag of words approach for describing pages and posts post: collection of tags annotated to a page at once Web page P i is transformed to a binary vector P i. T i, which is the post of P i, is transformed to a binary vector T i. 9

10 SOM Training All P i and T i were trained by the self- organizing map algorithm separately. Two maps M P and M T were obtained after the training.

11 Labeling We labeled each Web page on M P by finding its most similar neuron. A page cluster map (PCM) was obtained after all pages being labeled. The same approach was applied on all posts on M T and obtained tag cluster map (TCM). PCMTCM P 1, P 5, P 65 T 1, T 8

Association Discovery Finding associations between page clusters and post clusters. We used a voting scheme to find the associations. 12 PiPi TiTi PCMTCM +1 PxPx TyTy PjPj TjTj

Association Discovery Similarity between a page cluster P x and a post cluster T y : I: index set operator C k,l = 1 if P k is annotated by T l ; = 0 otherwise P x is associated with a post cluster T y with maximum similarity 13

14 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

15 Architecture of Tag Spam Detection Incoming Web page Page/tag associations Preprocessing Preprocessing Incoming page vector Labeling Labeling Labeled page cluster Tag Recommendation Tag Recommendation Recommended tags PCM

16 Tag Recommendation P x : the incoming Web page Let P x be labeled to P x. Let T x be the most related tag cluster of P x, all tags in T x will be recommended. PCMTCM PxPx TxTx TxTx PxPx recommended!

17 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

18 Experimental Results Dataset ECML/PKDD Discovery Challenge 2008 (RSDC 2008) tag recommendation dataset over 132K tags posted by 468 users bookmarked items, either Web pages or BibTeX entries contains some noisy data items without too much content items without tags

19 Experimental Results Preprocessing Discard tags that contain non-English characters Remove numeric tags Remove tags that are stop words such as ’for’ and ’the’ Transform all tags to lowercase Ignore extremely short tags Ignore extremely long tags Stemming the remaining tags

20 Experimental Results Parameters for SOM training ParameterPostsWeb pages Size of vocabulary/Dimension of vectors Number of training data16194 Size of map25 × 25 Learning rate0.4 Maximal training epoch count

Experimental Results Summary of PCM and TCM 21 PCMTCM Number of neurons (clusters)625 Number of unlabeled neurons4417 Average cluster size Maximum cluster size7345 Minimum cluster size (excluding unlabeled neurons) 312

22 Experimental Results We recommended each page with a set of 10 ranked tags. These recommended tags were then compared to the original tags. We use F1-measure to compare with the results in RSDC 2008.

Experimental Results Evaluation result 23 MethodF1 RSDC 2008 Rank RSDC 2008 Rank This work RSDC 2008 Rank

24 Conclusions A novel scheme for tag recommendation based on text mining. Relatedness between Web pages and tags were discovered based on clustering result of self-organizing map. Use only the content of Web pages instead of user behaviors.

Thank you!