Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.

Similar presentations


Presentation on theme: "Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung."— Presentation transcript:

1 Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung

2 2 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

3 3 Social Bookmarking –Why? Social bookmarking services (aka folksonomy) are gaining popularity since they have the following benefits: Alleviation of efforts in Web page annotation Improvement of retrieval precision Simplification of Web page classification

4 How a folksonomy works? Simple A user (u i ) annotates a Web page (o j ) with a set of tags or post (T ij ). Generally represented as a set of tuples (u i, o j, T ij ) 4 interesting… Let me add some tags. GrC2011 program Granular Computing uiui ojoj T ij

5 5 Collaboration Semantic relatedness help improving retrieval precision Social tagging is not a trivial task Characteristics of Folksonomy

6 6 the mechanism of suggesting proper tags to normal users when they try to adding tags to some Web page save the effort of users to select tags from the ground up constrain the formulation of tags Automatic tag recommendation process is thus beneficial for social bookmarking services as well as search engines. Tag Recommendation

7 7 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

8 Text Mining by SOM 8 Training Web pages Training Posts Page/tag associations Association discovery Preprocessing Web page vectors Post vectors SOM training Page clusters tag clusters Synaptic weight vectors Labeling tag associationspage associations

9 Preprocessing bag of words approach for describing pages and posts post: collection of tags annotated to a page at once Web page P i is transformed to a binary vector P i. T i, which is the post of P i, is transformed to a binary vector T i. 9

10 10 SOM Training All P i and T i were trained by the self- organizing map algorithm separately. Two maps M P and M T were obtained after the training.

11 11 Labeling We labeled each Web page on M P by finding its most similar neuron. A page cluster map (PCM) was obtained after all pages being labeled. The same approach was applied on all posts on M T and obtained tag cluster map (TCM). PCMTCM P 1, P 5, P 65 T 1, T 8

12 Association Discovery Finding associations between page clusters and post clusters. We used a voting scheme to find the associations. 12 PiPi TiTi PCMTCM +1 PxPx TyTy PjPj TjTj

13 Association Discovery Similarity between a page cluster P x and a post cluster T y : I: index set operator C k,l = 1 if P k is annotated by T l ; = 0 otherwise P x is associated with a post cluster T y with maximum similarity 13

14 14 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

15 15 Architecture of Tag Spam Detection Incoming Web page Page/tag associations Preprocessing Preprocessing Incoming page vector Labeling Labeling Labeled page cluster Tag Recommendation Tag Recommendation Recommended tags PCM

16 16 Tag Recommendation P x : the incoming Web page Let P x be labeled to P x. Let T x be the most related tag cluster of P x, all tags in T x will be recommended. PCMTCM PxPx TxTx TxTx PxPx recommended!

17 17 Outline Introduction Text Mining by SOM Tag Recommendation Process Experimental Results Conclusions

18 18 Experimental Results Dataset ECML/PKDD Discovery Challenge 2008 (RSDC 2008) tag recommendation dataset over 132K tags posted by 468 users 16235 bookmarked items, either Web pages or BibTeX entries contains some noisy data items without too much content items without tags

19 19 Experimental Results Preprocessing Discard tags that contain non-English characters Remove numeric tags Remove tags that are stop words such as ’for’ and ’the’ Transform all tags to lowercase Ignore extremely short tags Ignore extremely long tags Stemming the remaining tags

20 20 Experimental Results Parameters for SOM training ParameterPostsWeb pages Size of vocabulary/Dimension of vectors13201715 Number of training data16194 Size of map25 × 25 Learning rate0.4 Maximal training epoch count1100800

21 Experimental Results Summary of PCM and TCM 21 PCMTCM Number of neurons (clusters)625 Number of unlabeled neurons4417 Average cluster size27.926.6 Maximum cluster size7345 Minimum cluster size (excluding unlabeled neurons) 312

22 22 Experimental Results We recommended each page with a set of 10 ranked tags. These recommended tags were then compared to the original tags. We use F1-measure to compare with the results in RSDC 2008.

23 Experimental Results Evaluation result 23 MethodF1 RSDC 2008 Rank 10.19325 RSDC 2008 Rank 20.18674 This work0.1446 RSDC 2008 Rank 30.0284

24 24 Conclusions A novel scheme for tag recommendation based on text mining. Relatedness between Web pages and tags were discovered based on clustering result of self-organizing map. Use only the content of Web pages instead of user behaviors.

25 Thank you!


Download ppt "Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung."

Similar presentations


Ads by Google