Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處.

Similar presentations


Presentation on theme: "Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處."— Presentation transcript:

1 Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

2 Outline  Introduction  Data Set  Analysis of Tags  The Architecture  Evaluation

3 Introduction  Social network systems Del.icio.us, Facebook, MySpace, Youtube  Discovering Social Interests Main challenge  Difficult to detect and represent  Existing approaches: online connections

4 This paper ’ s work  Based on user-generated tags  Analyze the real-world traces of tags and web content  Develop the Internet Social Interest Discovery system (ISID) Discover the common user interests Cluster users and urls by topics  Evaluation

5 Data Set  Delicious Bookmark 4.3m bookmarks, 0.2m users, 1.4m urls

6 Data Collection and Pre-Processing  Crawl the urls & download the url pages  Discard all non-html objects  Coding -> UTF-8, remove non-English pages  Stopword List  Porter Stemming algorithm  298,350 distinct tags, 4,072,265 keywords

7 Users, URLs and Tags  Figure 1: Distribution of the frequencies that the URLs were bookmarked in our data set Log-log scale

8 Users, URLs and Tags  Figure 2: Distribution of the bookmarking activity Log-log scale

9 Users, URLs and Tags  Figure 3: Distribution of tag frequencies

10 Analysis of Tags  Use VSM model  Each URL: two vectors One in the space of all tags, one for doc keywords  A corpus with t terms and d documents A term-document matrix A =.

11 Weight Measurements  Tf-based  Tf-Idf based

12 An Example of Tags vs. Keywords  A URL bookmarked by users About the resolv.conf in Linux  Table show the top 10 keywords

13 The Vocabulary of Tags  Compare the vocabulary of tags with that of keywords in web documents  if the most import words be covered  Figure 4 (5) The coverage of user-generated tags for the tf (tf-idf) keywords of 7000 random docs.

14

15

16 The Convergence of Tag Selections  Measure the convergence of tags for all URLs  X-axis: the popularity of URLs  Y-axis: the no. of distinct tags

17 Tags Matched by Documents  Tags: catch the main concept of docs? Matched by the content of the URL?  Statistical analysis Occurrences no. -> weight Tag match ration e(T, U) T= ti: the set of tags attached to a given URL U The total weight of the tags that also appeared in the keyword set of U

18 Tags Matched by Documents

19 Architecture for Social Interest Discovery  1.Find topics of interests  2.Clustering  3.Indexing

20 Topic Discovery  Find frequent tag pattern for a given set  The association rule algorithms Support Implication rules Identify the frequent tag patterns a frequent tag pattern {a,b}  If w({a,b}) = w({a}) = w({b})

21 Clustering

22 Indexing

23 Evaluation  The URL Similarity of Intra- and Inter- Topics Cosine similarity of tf-idf keyword term vector Cosine similarity of Tag tem vector 500 interest topics  > 30 bookmarked urls  Share 5-6 co-occurring tags Inter-:  10,000 topic-pairs

24

25

26

27 User Interest Coverage  For each user Sort his tags by the number of times the tags have been used by the user  Top-5: the top 5 hot tags of each user  Top-10:  All:

28 Human Reviews  4 human editors 10 topics 20 most frequent urls for each topic Scores: 1-5

29 Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考

30 Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考

31 Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考

32 Conclusion(Add)  Propose a tag-based social interest discovery approach  Justify user-generated tags to represent user interests  Implement a system in social network such as delicious 此頁內容非原作者投影片,如需參考原版請至出處參考

33 References  Xin Li, Lei Guo, Yihong Zhao, Tag- based Social Interest Discovery, www08, Yahoo! Inc

34 備註  投影片下載出處: http://fusion.grids.cn/wiki/download/att achments/1313/Tag- based+Socail+Interest+Discovery- by+yjhuang.ppt?version=1  Data Set 網頁 http://delicious.com/


Download ppt "Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處."

Similar presentations


Ads by Google