He Xiangnan (PhD student) 11/2/2012 Research Updates.

Slides:



Advertisements
Similar presentations
Predicting User Interests from Contextual Information
Advertisements

Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.
Google News Personalization: Scalable Online Collaborative Filtering
A Systematic Study of the Mobile App Ecosystem Thanasis Petsas, Antonis Papadogiannakis, Evangelos P. Markatos Michalis PolychronakisThomas Karagiannis.
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
How PageRank Works Ketan Mayer-Patel University of North Carolina January 31, 2011.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Using a Trust Network To Improve Top-N Recommendation
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
(hyperlink-induced topic search)
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
PageRank Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 27, 2014.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
Developing a Predictive Model of Quality of Experience for Internet Video Athula Balachandran -CMU.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Google News Personalization: Scalable Online Collaborative Filtering
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
--He Xiangnan PhD student Importance Estimation of User-generated Data.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Ranking Link-based Ranking (2° generation) Reading 21.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Xutao Li1, Gao Cong1, Xiao-Li Li2
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Pin-Yun Tarng / An Analysis of WoW Players’ Game Hours Network and Systems Laboratory nslab.ee.ntu.edu.tw IEEE/IFIP DSN 2008 Network and Systems Laboratory.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Service Reliability Engineering The Chinese University of Hong Kong
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Mobile Robot Laboratory.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu
Recommendation in Scholarly Big Data
Xiang Li,1 Lili Mou,1 Rui Yan,2 Ming Zhang1
Neighborhood - based Tag Prediction
WSRec: A Collaborative Filtering Based Web Service Recommender System
IR Theory: Evaluation Methods
Personalized Celebrity Video Search Based on Cross-space Mining
Junghoo “John” Cho UCLA
Fusing Rating-based and Hitting-based Algorithms in Recommender Systems Xin Xin
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

He Xiangnan (PhD student) 11/2/2012 Research Updates

Research Topic General topic: Leveraging UGC in Web2.0 to improve some IR related tasks Current task:  Leveraging user comments to enable popularity-aware rank of items in Web2.0

Popularity-aware rank Based on the current states of items, ranking items to reflect their popularity in the future. Motivation of popularity-aware rank:  Unequally distribution of popularity  Huge temporal dynamics of popularity A rank of items that can forecast their future popularity will improve the user experience, especially for some temporal-related queries. Examples..

Examples(I) - Search “nba” to YouTube at 6/22/2012 night (NBA final games at that day morning) - None of the top results are not about the championship of the Miami Heat

Examples(II) -Search “nobel prize China” at 10/12/2012 to Google domain search(YouTube) -None of the top results are about Mo Yan’s Nobel Literature Prize

Challenges Intuitive way:  Utilizing the visiting histories of items, treating them as time-series and performing prediction Difficulties:  Visiting histories are difficult to get and maintain (expensive)  Traditional time-series prediction approaches are easy to fail in the case that items are experiencing bursts My proposal:  Leveraging the user comments

Observation in YouTube Observation: the comment history is highly correlated with the view history

Pre-Analysis(I) YouTube dataset (14,509 videos of ten queries). Pearson correlation of comment history and view history: More than 80% videos with correlation more than 0.5 Conclusion: the comment history is highly correlated with the view history!

Pre-Analysis(II) Have shown the tight correlation of comments and views A natural question: how the past comments reflect the future comments? Autocorrelation of a series:  Measure the correlation of a time series at different distances apart (lags)  is the correlation of series {x_1, x_n-k} and {x_k+1, x_n}

Results of Autocorrelation of Comment Series Exhibits a short-term correlation (r_1 is large and r_k decreases very fast) Conclusion: the recent comments reflect most of the future and the predicting ability decreases with time.

Intuitions More comments an item has, more popular it is.  Each comment has a contribution to the item’s popularity(or importance) Different user’s commenting behavior has different influence on the item’s popularity.  Social interfaces in Web2.0 systems.  More active the user is, more influence it is.  More popular the commented item is, more influence the user is.

Method Overview

User-Item Temporal Bipartite Graph Model The edge weight (decay function with time): The weight matrix of the graph:

Random Walk Process(I) Transition matrix: The nature iterative process (HITS): Problem: if the graph is sparse and disconnected, it will be trapped into local optima.

Random Walk Process(II) Add the smoothing to avoid the local optima case: The process in the bipartite graph can be converted into a random walk in homogeneous-node graph and it will converge (Proof ignored.)

Experiments Crawled 3 datasets(20k size) to give a comprehensive evaluation of the performance in general Web2.0 systems. Have not done the whole experiments yet, show an experimental result on Last.fm #Item#User#Comment YouTube Flickr Last.fm

Preparation 2 time points:  (t0)  (t1)  Goundtruth is the #views in (t1-t0) Comparing methods:  Comm_Oracle: #comment in the future days(t1-t0).  VC: View Count in the day t0  CCP: Comment Count in the Past 3 days of t0  Sum_Tscore: the sum of all comments’ contribution(all users have the same weights)  TPR: my approach

Overall Performance

Split by different popularity Preparation:  Sort all items by the #view. (Large -> Small)  Split the items into 5 folds, each with the same size  Evaluate each fold.  Report the average performance of all folds

Average Performance of Splitted Folds Observation: For the 1 st fold, VC is the best; for the 2-5 folds, TPR is the best. Possible Reason: for the Last.fm dataset, the past extreme popular artists still attract many visits without attracting many new comments, such as The Beatles, Muse.

To do... Finish the experiments in the other datasets. Refinement of the approach for different types of data. such as:  For extreme popular but old items, using personalized vector to have a bias.

Questions && Suggestion? Thanks!