Download presentation
Presentation is loading. Please wait.
Published byThomasine Burns Modified over 9 years ago
1
Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處
2
Outline Introduction Data Set Analysis of Tags The Architecture Evaluation
3
Introduction Social network systems Del.icio.us, Facebook, MySpace, Youtube Discovering Social Interests Main challenge Difficult to detect and represent Existing approaches: online connections
4
This paper ’ s work Based on user-generated tags Analyze the real-world traces of tags and web content Develop the Internet Social Interest Discovery system (ISID) Discover the common user interests Cluster users and urls by topics Evaluation
5
Data Set Delicious Bookmark 4.3m bookmarks, 0.2m users, 1.4m urls
6
Data Collection and Pre-Processing Crawl the urls & download the url pages Discard all non-html objects Coding -> UTF-8, remove non-English pages Stopword List Porter Stemming algorithm 298,350 distinct tags, 4,072,265 keywords
7
Users, URLs and Tags Figure 1: Distribution of the frequencies that the URLs were bookmarked in our data set Log-log scale
8
Users, URLs and Tags Figure 2: Distribution of the bookmarking activity Log-log scale
9
Users, URLs and Tags Figure 3: Distribution of tag frequencies
10
Analysis of Tags Use VSM model Each URL: two vectors One in the space of all tags, one for doc keywords A corpus with t terms and d documents A term-document matrix A =.
11
Weight Measurements Tf-based Tf-Idf based
12
An Example of Tags vs. Keywords A URL bookmarked by users About the resolv.conf in Linux Table show the top 10 keywords
13
The Vocabulary of Tags Compare the vocabulary of tags with that of keywords in web documents if the most import words be covered Figure 4 (5) The coverage of user-generated tags for the tf (tf-idf) keywords of 7000 random docs.
16
The Convergence of Tag Selections Measure the convergence of tags for all URLs X-axis: the popularity of URLs Y-axis: the no. of distinct tags
17
Tags Matched by Documents Tags: catch the main concept of docs? Matched by the content of the URL? Statistical analysis Occurrences no. -> weight Tag match ration e(T, U) T= ti: the set of tags attached to a given URL U The total weight of the tags that also appeared in the keyword set of U
18
Tags Matched by Documents
19
Architecture for Social Interest Discovery 1.Find topics of interests 2.Clustering 3.Indexing
20
Topic Discovery Find frequent tag pattern for a given set The association rule algorithms Support Implication rules Identify the frequent tag patterns a frequent tag pattern {a,b} If w({a,b}) = w({a}) = w({b})
21
Clustering
22
Indexing
23
Evaluation The URL Similarity of Intra- and Inter- Topics Cosine similarity of tf-idf keyword term vector Cosine similarity of Tag tem vector 500 interest topics > 30 bookmarked urls Share 5-6 co-occurring tags Inter-: 10,000 topic-pairs
27
User Interest Coverage For each user Sort his tags by the number of times the tags have been used by the user Top-5: the top 5 hot tags of each user Top-10: All:
28
Human Reviews 4 human editors 10 topics 20 most frequent urls for each topic Scores: 1-5
29
Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考
30
Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考
31
Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考
32
Conclusion(Add) Propose a tag-based social interest discovery approach Justify user-generated tags to represent user interests Implement a system in social network such as delicious 此頁內容非原作者投影片,如需參考原版請至出處參考
33
References Xin Li, Lei Guo, Yihong Zhao, Tag- based Social Interest Discovery, www08, Yahoo! Inc
34
備註 投影片下載出處: http://fusion.grids.cn/wiki/download/att achments/1313/Tag- based+Socail+Interest+Discovery- by+yjhuang.ppt?version=1 Data Set 網頁 http://delicious.com/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.