Download presentation
Presentation is loading. Please wait.
Published byHector Greene Modified over 8 years ago
1
--He Xiangnan PhD student Importance Estimation of User-generated Data
2
Motivation Emerging user-generated-data of web2.0 applications(Youtube, Twitter, Foursquare.etc) Estimate their importance and popularity in the future Applications: Web crawling Relevance ranking Index Selection
3
Related Work(Page Importance) Link-connectivity based PageRank Content analysis User-centric crawling Sitemap based Web-log(in server) based Learning to rank
4
Constraints of their works Link structure analysis is not suitable for many web2.0 websites. Contents are generated by users Updated very frequently Ignore the temporal factor, cannot reflect the popularity of a webpage in the future
5
Empirical Studies Analysis of user-posted-data in YouTube, Digg.etc.: Popularity is changing with time: View count per unit(hour/day) conforms a Log Normal distribution after posting. Some activities may influence popularity for a time period Such as external reference from other websites, internal related recommendation will cause a burst of view count
6
Related Work (Popularity Prediction) Two main types of prediction method of user- generated data (i) A very complex model that considers various factors of a specific website Problem: model is too specified, not general (ii)Statistical analysis over large volume of data, training a regression model Problem: only reflect collective patterns, can not be used for individual webpage
7
Our Goal Based on some common features of user- generated-data in web2.0 applications, propose a general model that can roughly predict a webpage’s importance and future popularity.
8
Our Idea Taking into account page-level statistics: # views / replies # like / dislike Time of created/comments Using the features in common, training a model of importance estimation
9
Examples
10
Methods Comments have tight correlation with view count Popularity prediction: View count as the popularity metric (but it’s only a snapshot of current time) Comments are the traces left by users, can reflect users’ response, use comment history to predict future popularity
11
Current Progress(with Shawn) Analyzing how the features reflect popularity Collecting datasets (YouTube, Digg.etc.) Reading related papers
12
Better suggestions? Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.