Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Web Mining.
Recommender Systems & Collaborative Filtering
Google News Personalization: Scalable Online Collaborative Filtering
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender Systems; Social Information Filtering.
Recommender systems Ram Akella November 26 th 2008.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Overview of Web Data Mining and Applications Part I
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Data Mining Techniques
Performance of Recommender Algorithms on Top-N Recommendation Tasks
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, April 13, 2011.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
User Modeling and Recommender Systems: Introduction to recommender systems Adolfo Ruiz Calleja 06/09/2014.
Data Mining and Decision Support
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Recommendation Systems By: Bryan Powell, Neil Kumar, Manjap Singh.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Search User Behavior: Expanding The Web Search Frontier
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Collaborative Filtering Nearest Neighbor Approach
Q4 : How does Netflix recommend movies?
Ensembles.
Recommender Systems Copyright: Dietmar Jannah, Markus Zanker and Gerhard Friedrich (slides based on their IJCAI talk „Tutorial: Recommender Systems”)
CSE 491/891 Lecture 25 (Mahout).
Recommendation Systems
Presentation transcript:

Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences András A. Benczúr

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Web 2.0, 3.0 …? Platform convergence (Web, PC, mobile, television) – information vs. recreation Emphasis on social content (blogs, Wikipedia, photo and video sharing) From search towards recommendation (query free, profile based, personalized) From text towards multimedia Glocalization (language, geography) Spam

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 A sample service RSS Web 2.0 Small screen browsing Recommendation based on user profile (avoid query typing) Read blogs, view media, … client software Recommende r engine

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 The user profile History stored for each user: Known ratings, preferences, opinion – scarce! Items read, weighted by time spent details seen, scrolling, back button Terms in documents read, tf.idf weighted top list User language, region, current location and known sociodemographic data Multimedia!

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Same item—multiple source

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Information vs recreation: Do not mix the two?

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Spam is increasingly annoying

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Distribution of categories Reputable 70.0% Spam 16.5% Weborg 0.8% Ad 3.7% Non-existent 7.9% Empty 0.4% Alias 0.3% Unknown 0.4%

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Keresési találati pozíció hatása Találati pozíció nézésével töltött idő Találathoz érkezés ideje

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Multimedia Information Retrieval

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Similar objects Segmentation

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Class of Query Image Pre-classified Images VOC2007 Original Training Set Query Images ImageCLEF Object Retrieval Task

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Networked relation spam social network analysis churn

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Szociális hálózatok home business ADSL --- ADSL ---

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Biztosítási csalások – hálózatban

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Stacked Graphical Learning 1.Predict churn p(v) of node v 2.For target node u, aggregate p(v) for neighbors to form new feature f(u) 3.Rerun classification by adding feature f(.) 4.Iterate ? u v1v1 v2v2 v7v7

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Why social networks are hard to analyze Subgraphs of social networks Medium size dense communities attract much algorithmic work Tentacles induce noise

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Mapping into 2D plain spectral semidefinite

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Research Highlights Recommenders: KDD Cup 2007 Task 1 First Prize Predict the probability that a user rated a movie in 2006, based on year –2005 training data Spam filtering: Web Spam Challenge 1 first place Churn prediction: method presented at KDD Cup 2009 Workshop Task XXXX

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Netflix: lessons and differences learned Ratings 1– 5 stars Predict an unseen rating Evaluation: RMSE : $1,000,000 Current leader: Oct/07: KDD Cup 2007 same data set predict existence of a rating

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Results of two separate tasks BellKor team report [Bell, Koren 2007]: Low rank approximation Restricted Boltzmann Machine Nearest neighbor KDD Cup 2007: Predict probability that a user rated a movie in 2006: Given list of 100,000 user–movie pairs Users and movies drawn from Netflix Prize data set Winner report [K, B, and our colleauges 2007]

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 For a given user i and movie j where is the predicted value KDD Cup example: Our RMSE: First runner up: All zeroes prediction: (Place 10-13) But why do we use RMSE and not precision/recall? RMSE preferes correct probability guesses for the majority unfrequently visited items The presence of the recommender changes usage Evaluation and Issue 1

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Method Overview Probability by naive user-movie independence Item frequency estimation (Time Series) User frequency estimation Reaches RMSE in itself (still first place) Data Mining SVD Item-item similarities Association Rules Combination (we used linear regression)

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Time series prediction Interest remains for long time range (several years)

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Short lifetime of online items Origo Very different behavior in time: news articles Publication day Next day usage peak Third day and gone …

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 K-dim SVD: Noise filtering – the essence of the matrix – optimizes SVD explains ratings as effect of few linear factors RMSE ( ℓ 2 error) dim: 0.93 Issue: too many news items 18K Netflix movies vs. potentially infinite set of items -> may recommend data source but not the item SVD user movienews item

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Content similarity might be the key feature Relative success of trivial estimates on KDD Cup! Data mining techniques overlap, apparently catch similar patterns Precision/recall is more important than RMSE Solution must make heavy use of time Lessons learned

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008 Future plans and ideas New partners and application fields: network infrastructure, new generation services, bioinformatics, …? Scaling our solutions to multi-core architectures Use our search (cross-lingual, multimedia etc) and recommender system capabilities in major solutions; mobile, new generation platforms etc. Expand means of our European level collaboration, e.g. KIC participation

Questions ? Andras A. Benczur