Download presentation
Presentation is loading. Please wait.
Published byAlannah Allen Modified over 9 years ago
1
Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences András A. Benczúr
2
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Web 2.0, 3.0 …? Platform convergence (Web, PC, mobile, television) – information vs. recreation Emphasis on social content (blogs, Wikipedia, photo and video sharing) From search towards recommendation (query free, profile based, personalized) From text towards multimedia Glocalization (language, geography) Spam
3
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 A sample service RSS Web 2.0 Small screen browsing Recommendation based on user profile (avoid query typing) Read blogs, view media, … client software Recommende r engine
4
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 The user profile History stored for each user: Known ratings, preferences, opinion – scarce! Items read, weighted by time spent details seen, scrolling, back button Terms in documents read, tf.idf weighted top list User language, region, current location and known sociodemographic data Multimedia!
5
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Same item—multiple source
6
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Information vs recreation: Do not mix the two?
7
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Spam is increasingly annoying
8
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Distribution of categories Reputable 70.0% Spam 16.5% Weborg 0.8% Ad 3.7% Non-existent 7.9% Empty 0.4% Alias 0.3% Unknown 0.4%
9
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Keresési találati pozíció hatása Találati pozíció nézésével töltött idő Találathoz érkezés ideje
10
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Multimedia Information Retrieval
11
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Similar objects Segmentation
12
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Class of Query Image Pre-classified Images VOC2007 Original Training Set Query Images ImageCLEF Object Retrieval Task
13
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Networked relation spam social network analysis churn
14
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Szociális hálózatok home business ADSL --- ADSL ---
15
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Biztosítási csalások – hálózatban
16
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Stacked Graphical Learning 1.Predict churn p(v) of node v 2.For target node u, aggregate p(v) for neighbors to form new feature f(u) 3.Rerun classification by adding feature f(.) 4.Iterate ? u v1v1 v2v2 v7v7
17
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Why social networks are hard to analyze Subgraphs of social networks Medium size dense communities attract much algorithmic work Tentacles induce noise
18
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Mapping into 2D plain spectral semidefinite
19
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Research Highlights Recommenders: KDD Cup 2007 Task 1 First Prize Predict the probability that a user rated a movie in 2006, based on year –2005 training data Spam filtering: Web Spam Challenge 1 first place Churn prediction: method presented at KDD Cup 2009 Workshop Task XXXX
20
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Netflix: lessons and differences learned Ratings 1– 5 stars Predict an unseen rating Evaluation: RMSE 0.8572: $1,000,000 Current leader: 0.8650 Oct/07: 0.8712 KDD Cup 2007 same data set predict existence of a rating
21
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Results of two separate tasks BellKor team report [Bell, Koren 2007]: Low rank approximation Restricted Boltzmann Machine Nearest neighbor KDD Cup 2007: Predict probability that a user rated a movie in 2006: Given list of 100,000 user–movie pairs Users and movies drawn from Netflix Prize data set Winner report [K, B, and our colleauges 2007]
22
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 For a given user i and movie j where is the predicted value KDD Cup example: Our RMSE: 0.256 First runner up: 0.263 All zeroes prediction: 0.279 (Place 10-13) But why do we use RMSE and not precision/recall? RMSE preferes correct probability guesses for the majority unfrequently visited items The presence of the recommender changes usage Evaluation and Issue 1
23
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Method Overview Probability by naive user-movie independence Item frequency estimation (Time Series) User frequency estimation Reaches RMSE 0.260 in itself (still first place) Data Mining SVD Item-item similarities Association Rules Combination (we used linear regression)
24
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Time series prediction Interest remains for long time range (several years)
25
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Short lifetime of online items Origo Very different behavior in time: news articles http://www.origo.hu/filmklub/20060124kiolte.html Publication day Next day usage peak Third day and gone …
26
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 K-dim SVD: Noise filtering – the essence of the matrix – optimizes SVD explains ratings as effect of few linear factors RMSE ( ℓ 2 error) 10-30 dim: 0.93 Issue: too many news items 18K Netflix movies vs. potentially infinite set of items -> may recommend data source but not the item SVD user movienews item
27
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Content similarity might be the key feature Relative success of trivial estimates on KDD Cup! Data mining techniques overlap, apparently catch similar patterns Precision/recall is more important than RMSE Solution must make heavy use of time Lessons learned
28
A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Future plans and ideas New partners and application fields: network infrastructure, new generation services, bioinformatics, …? Scaling our solutions to multi-core architectures Use our search (cross-lingual, multimedia etc) and recommender system capabilities in major solutions; mobile, new generation platforms etc. Expand means of our European level collaboration, e.g. KIC participation
29
Questions ? Andras A. Benczur benczur@sztaki.hu http://datamining.sztaki.hu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.