Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences.

Similar presentations


Presentation on theme: "Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences."— Presentation transcript:

1 Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences András A. Benczúr

2 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Web 2.0, 3.0 …? Platform convergence (Web, PC, mobile, television) – information vs. recreation Emphasis on social content (blogs, Wikipedia, photo and video sharing) From search towards recommendation (query free, profile based, personalized) From text towards multimedia Glocalization (language, geography) Spam

3 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 A sample service RSS Web 2.0 Small screen browsing Recommendation based on user profile (avoid query typing) Read blogs, view media, … client software Recommende r engine

4 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 The user profile History stored for each user: Known ratings, preferences, opinion – scarce! Items read, weighted by time spent details seen, scrolling, back button Terms in documents read, tf.idf weighted top list User language, region, current location and known sociodemographic data Multimedia!

5 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Same item—multiple source

6 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Information vs recreation: Do not mix the two?

7 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Spam is increasingly annoying

8 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Distribution of categories Reputable 70.0% Spam 16.5% Weborg 0.8% Ad 3.7% Non-existent 7.9% Empty 0.4% Alias 0.3% Unknown 0.4%

9 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Keresési találati pozíció hatása Találati pozíció nézésével töltött idő Találathoz érkezés ideje

10 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Multimedia Information Retrieval

11 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Similar objects Segmentation

12 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Class of Query Image Pre-classified Images VOC2007 Original Training Set Query Images ImageCLEF Object Retrieval Task

13 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Networked relation spam social network analysis churn

14 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Szociális hálózatok home business ADSL --- ADSL ---

15 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Biztosítási csalások – hálózatban

16 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Stacked Graphical Learning 1.Predict churn p(v) of node v 2.For target node u, aggregate p(v) for neighbors to form new feature f(u) 3.Rerun classification by adding feature f(.) 4.Iterate ? u v1v1 v2v2 v7v7

17 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Why social networks are hard to analyze Subgraphs of social networks Medium size dense communities attract much algorithmic work Tentacles induce noise

18 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Mapping into 2D plain spectral semidefinite

19 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Research Highlights Recommenders: KDD Cup 2007 Task 1 First Prize Predict the probability that a user rated a movie in 2006, based on year –2005 training data Spam filtering: Web Spam Challenge 1 first place Churn prediction: method presented at KDD Cup 2009 Workshop Task XXXX

20 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Netflix: lessons and differences learned Ratings 1– 5 stars Predict an unseen rating Evaluation: RMSE 0.8572: $1,000,000 Current leader: 0.8650 Oct/07: 0.8712 KDD Cup 2007 same data set predict existence of a rating

21 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Results of two separate tasks BellKor team report [Bell, Koren 2007]: Low rank approximation Restricted Boltzmann Machine Nearest neighbor KDD Cup 2007: Predict probability that a user rated a movie in 2006: Given list of 100,000 user–movie pairs Users and movies drawn from Netflix Prize data set Winner report [K, B, and our colleauges 2007]

22 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 For a given user i and movie j where is the predicted value KDD Cup example: Our RMSE: 0.256 First runner up: 0.263 All zeroes prediction: 0.279 (Place 10-13) But why do we use RMSE and not precision/recall? RMSE preferes correct probability guesses for the majority unfrequently visited items The presence of the recommender changes usage Evaluation and Issue 1

23 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Method Overview Probability by naive user-movie independence Item frequency estimation (Time Series) User frequency estimation Reaches RMSE 0.260 in itself (still first place) Data Mining SVD Item-item similarities Association Rules Combination (we used linear regression)

24 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Time series prediction Interest remains for long time range (several years)

25 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Short lifetime of online items Origo Very different behavior in time: news articles http://www.origo.hu/filmklub/20060124kiolte.html Publication day Next day usage peak Third day and gone …

26 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 K-dim SVD: Noise filtering – the essence of the matrix – optimizes SVD explains ratings as effect of few linear factors RMSE ( ℓ 2 error) 10-30 dim: 0.93 Issue: too many news items 18K Netflix movies vs. potentially infinite set of items -> may recommend data source but not the item SVD user movienews item

27 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Content similarity might be the key feature Relative success of trivial estimates on KDD Cup! Data mining techniques overlap, apparently catch similar patterns Precision/recall is more important than RMSE Solution must make heavy use of time Lessons learned

28 A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Future plans and ideas New partners and application fields: network infrastructure, new generation services, bioinformatics, …? Scaling our solutions to multi-core architectures Use our search (cross-lingual, multimedia etc) and recommender system capabilities in major solutions; mobile, new generation platforms etc. Expand means of our European level collaboration, e.g. KIC participation

29 Questions ? Andras A. Benczur benczur@sztaki.hu http://datamining.sztaki.hu


Download ppt "Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences."

Similar presentations


Ads by Google