Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
A Topic Detection and Tracking method combining NLP with Suffix Tree Clustering Author : Yaohong JIN Source : International Conference on Computer Science.
Mar. 14, :Vito ’ s family ★☆☆☆☆ 題組: Problem Set Archive with Online Judge 題號: 10014:Vito’s family 解題者:劉淑惠、侯沛彣 解題日期: 2006 年 3 月 9 日 題意: Vito.
PowerPoint2010 李燕秋 版面配置 版面配置指的是每一個頁面的內容配置 方式,不同的版面配置會有對應的母片。
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
1 實驗二 : SIP User Mobility 實驗目的 藉由 Registra 和 Redirect Server 的設計,深入瞭解 SIP 的運 作及訊息格式。 實作部分 ( 1 )實作一個 Registrar 來接收 SIP REGISTER ,而且 要將 REGISTER 中 Contact.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Commentary-based Video Categorization and Concept Discovery By Janice Leung.
Department of Electrical Engineering, National Central University DIC Final Project Project deadline 2009/1/19( 中午 12:00)
: Problem A : MiniMice ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11411: Problem A : MiniMice 解題者:李重儀 解題日期: 2008 年 9 月 3 日 題意:簡單的說,題目中每一隻老鼠有一個編號.
The Vector Space Model …and applications in Information Retrieval.
資料結構實習-六.
Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.
多媒體技術與應用 實習作業 Part II. 實習作業 利用 Corel Paint Shop Pro X2 完成作業。 作業一:利用影像處理的技巧,讓這張影像變 的更清晰。
Chapter 5: Information Retrieval and Web Search
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
What is a bibliography? A bibliography is an alphabetical list of any sources you consult in any language, including– books magazines newspapers CD-ROMs.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Tag-based Social Interest Discovery 2009/2/9 Presenter: Lin, Sin-Yan 1 Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc WWW 2008 Social Networks & Web 2.0.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Tag-based Social Interest Discovery SNU IDB Lab. Chung-soo Jang April 18, 2008 WWW 2008, Beijing, China. Xin Li, Lei Guo, Yihong (Eric) Zhao Yahoo! Inc.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
A New Suffix Tree Similarity Measure for Document Clustering
Amy Dai Machine learning techniques for detecting topics in research papers.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
計算機程式 第二單元 Control Structure I 授課教師:廖婉君教授 【本著作除另有註明外,採取創用 CC 「姓名標示 -非商業性-相同方式分享」台灣 3.0 版授權釋出】創用 CC 「姓名標示 -非商業性-相同方式分享」台灣 3.0 版 本課程指定教材為 C++ How to Program,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Web- and Multimedia-based Information Systems Lecture 2.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Thesis Proposal: Prediction of popular social annotations Abon.
Vector Space Models.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Selected New Training Documents to Update User Profile Abdulmohsen Algarni and Yuefeng Li and Yue Xu CIKM 2010 Hao-Chin Chang Department of Computer Science.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Operations Management Unit 1: Using Operations to Compete 授課教師: 國立臺灣大學工商管理學系 黃崇興 教授 本課程指定教材為 Operations Management: Processes and Supply Chains, 10th ed.,
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Authors: Yutaka Matsuo & Mitsuru Ishizuka Designed by CProDM Team.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Neighborhood - based Tag Prediction
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Text Categorization Assigning documents to a fixed set of categories
Chapter 5: Information Retrieval and Web Search
Panagiotis G. Ipeirotis Luis Gravano
Boolean and Vector Space Retrieval Models
Presentation transcript:

Tag-based Social Interest Discovery By yjhuang Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Outline  Introduction  Data Set  Analysis of Tags  The Architecture  Evaluation

Introduction  Social network systems Del.icio.us, Facebook, MySpace, Youtube  Discovering Social Interests Main challenge  Difficult to detect and represent  Existing approaches: online connections

This paper ’ s work  Based on user-generated tags  Analyze the real-world traces of tags and web content  Develop the Internet Social Interest Discovery system (ISID) Discover the common user interests Cluster users and urls by topics  Evaluation

Data Set  Delicious Bookmark 4.3m bookmarks, 0.2m users, 1.4m urls

Data Collection and Pre-Processing  Crawl the urls & download the url pages  Discard all non-html objects  Coding -> UTF-8, remove non-English pages  Stopword List  Porter Stemming algorithm  298,350 distinct tags, 4,072,265 keywords

Users, URLs and Tags  Figure 1: Distribution of the frequencies that the URLs were bookmarked in our data set Log-log scale

Users, URLs and Tags  Figure 2: Distribution of the bookmarking activity Log-log scale

Users, URLs and Tags  Figure 3: Distribution of tag frequencies

Analysis of Tags  Use VSM model  Each URL: two vectors One in the space of all tags, one for doc keywords  A corpus with t terms and d documents A term-document matrix A =.

Weight Measurements  Tf-based  Tf-Idf based

An Example of Tags vs. Keywords  A URL bookmarked by users About the resolv.conf in Linux  Table show the top 10 keywords

The Vocabulary of Tags  Compare the vocabulary of tags with that of keywords in web documents  if the most import words be covered  Figure 4 (5) The coverage of user-generated tags for the tf (tf-idf) keywords of 7000 random docs.

The Convergence of Tag Selections  Measure the convergence of tags for all URLs  X-axis: the popularity of URLs  Y-axis: the no. of distinct tags

Tags Matched by Documents  Tags: catch the main concept of docs? Matched by the content of the URL?  Statistical analysis Occurrences no. -> weight Tag match ration e(T, U) T= ti: the set of tags attached to a given URL U The total weight of the tags that also appeared in the keyword set of U

Tags Matched by Documents

Architecture for Social Interest Discovery  1.Find topics of interests  2.Clustering  3.Indexing

Topic Discovery  Find frequent tag pattern for a given set  The association rule algorithms Support Implication rules Identify the frequent tag patterns a frequent tag pattern {a,b}  If w({a,b}) = w({a}) = w({b})

Clustering

Indexing

Evaluation  The URL Similarity of Intra- and Inter- Topics Cosine similarity of tf-idf keyword term vector Cosine similarity of Tag tem vector 500 interest topics  > 30 bookmarked urls  Share 5-6 co-occurring tags Inter-:  10,000 topic-pairs

User Interest Coverage  For each user Sort his tags by the number of times the tags have been used by the user  Top-5: the top 5 hot tags of each user  Top-10:  All:

Human Reviews  4 human editors 10 topics 20 most frequent urls for each topic Scores: 1-5

Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考

Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考

Cluster Properties(Add) 此頁內容非原作者投影片,如需參考原版請至出處參考

Conclusion(Add)  Propose a tag-based social interest discovery approach  Justify user-generated tags to represent user interests  Implement a system in social network such as delicious 此頁內容非原作者投影片,如需參考原版請至出處參考

References  Xin Li, Lei Guo, Yihong Zhao, Tag- based Social Interest Discovery, www08, Yahoo! Inc

備註  投影片下載出處: achments/1313/Tag- based+Socail+Interest+Discovery- by+yjhuang.ppt?version=1  Data Set 網頁