Copyright © 2014 Criteo 2014-11-20 15 millions de prédictions par seconde Les défis de Criteo Nicolas Le Roux Scientific Program Manager - R&D.

Slides:



Advertisements
Similar presentations
Data Mining and Text Analytics Advertising Laura Quinn.
Advertisements

Random Forest Predrag Radenković 3237/10
Sponsored Search Cory Pender Sherwin Doroudi. Optimal Delivery of Sponsored Search Advertisements Subject to Budget Constraints Zoe Abrams Ofer Mendelevitch.
Discrete Choice Model of Bidder Behavior in Sponsored Search Quang Duong University of Michigan Sebastien Lahaie
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
The Roles of Uncertainty and Randomness in Online Advertising Ragavendran Gopalakrishnan Eric Bax Raga Gopalakrishnan 2 nd Year Graduate Student (Computer.
Simulation Operations -- Prof. Juran.
Instructions First-price No Communication treatment.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Support Vector Machines and Kernel Methods
x – independent variable (input)
Simulating Normal Random Variables Simulation can provide a great deal of information about the behavior of a random variable.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
Adaboost and its application
Experimental Evaluation
WHO WE ARE ●Website Development & Design ●Web Marketing Strategy, Training, and Analysis ●Web Applications, iOS apps, Android apps.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Sampling Theory Determining the distribution of Sample statistics.
Classification and Prediction: Regression Analysis
Airline Schedule Optimization (Fleet Assignment II) Saba Neyshabouri.
Radial Basis Function Networks
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
AdWords Instructor: Dawn Rauscher. Quality Score in Action 0a2PVhPQhttp:// 0a2PVhPQ.
Overview DM for Business Intelligence.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
HAL R VARIAN FEBRUARY 16, 2009 PRESENTED BY : SANKET SABNIS Online Ad Auctions 1.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Sampling Theory Determining the distribution of Sample statistics.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Save Money... Get Better Results. Save Money On Your Ad Costs & Get More New Customers … With Our Real Time Bidding Ad Platform (RTB)  Save 20 – 30%
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
Marketing and CS Philip Chan. Enticing you to buy a product 1. What is the content of the ad? 2. Where to advertise? TV, radio, newspaper, magazine, internet,
Online Advertising Greg Lackey. Advertising Life Cycle The Past Mass media Current Media fragmentation The Future Target market Audio/visual enhancements.
A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO,
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Predictive Analytics World CONFIDENTIAL1 Predictive Keyword Scores to Optimize PPC Campaigns Vincent Granville, Ph.D. Click Forensics February 19, 2009.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Large-Scale Real-Time Product Recommendation at Criteo
Data Mining and Decision Support
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Executive Summary - Human Factors Heuristic Evaluation 04/18/2014.
IB Business & Management
Predicting Winning Price In Real Time Bidding With Censored Data Tejaswini Veena Sambamurthy Weicong Chen.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Chapter 5: Paid Search Marketing
Global RTB Market Size, Share, Global Trends, Company Profiles, Demand, Insights, Analysis, Research, Report, Opportunities, Published By :
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Part I Project Initiation.
Chapter 14 Introduction to Multiple Regression
Chapter 7. Classification and Prediction
Machine Learning Basics
CIKM Competition 2014 Second Place Solution
Learning Theory Reza Shadmehr
Presentation transcript:

Copyright © 2014 Criteo millions de prédictions par seconde Les défis de Criteo Nicolas Le Roux Scientific Program Manager - R&D

Copyright © 2014 Criteo What does Criteo do? We buy advertising spaces on websites We display ads for our partners We get paid if the user clicks on the ad

Copyright © 2014 Criteo Retargeting

Copyright © 2014 Criteo In practice 1. A user lands on a webpage 2. The website contacts an ad-exchange 3. The ad-exchange contacts Criteo and its competitors 4. It is an auction: each competitor tells how much it bids 5. The highest bidder wins the right to display an ad

Copyright © 2014 Criteo Details of the auction Real-time bidding (RTB) Second-price auction: the winner pays the second highest price Optimal strategy: bid the expected gain Expected gain = price per click (CPC) * probability of click (CTR)

Copyright © 2014 Criteo Goal of the arbitrage Choose the appropriate advertiser Only win the profitable displays: it is a prediction problem An overbid means you are losing money An underbid means you are losing potential revenue

Copyright © 2014 Criteo What to do once we win the display? We are now directly in contact with the website Choose the best products Choose the color, the font and the layout Generate the banner

Copyright © 2014 Criteo Goal of the recommendation Increase the value of a display: it is a ranking problem The potential gains are huge

Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day

Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day

Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day

Copyright © 2014 Criteo Fitting within the real-time constraints To avoid timeouts, only simple models are allowed Logistic regression Decision trees What freedom do we have?

Copyright © 2014 Criteo Predicting the CTR

Copyright © 2014 Criteo Predicting the CTR

Copyright © 2014 Criteo A large number of parameters

Copyright © 2014 Criteo A large number of parameters

Copyright © 2014 Criteo A large number of parameters

Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information

Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life"

Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life" We model these by creating "cross-features"

Copyright © 2014 Criteo Modeling higher-order information

Copyright © 2014 Criteo Hashing This model has estimation and computational issues

Copyright © 2014 Criteo Hashing

Copyright © 2014 Criteo Hashing

Copyright © 2014 Criteo Hashing

Copyright © 2014 Criteo Hashing

Copyright © 2014 Criteo Dealing with collisions

Copyright © 2014 Criteo Dealing with collisions

Copyright © 2014 Criteo Dealing with collisions

Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? »

Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? » Storage: #Banners/day x #Days x 4 = 480GB

Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? » Storage: #Banners/day x #Days x 4 = 480GB RAM: #Users x #Campaigns x 4 = 40GB

Copyright © 2014 Criteo Feature selection About 40 original features

Copyright © 2014 Criteo Feature selection About 40 original features 780 level-2 cross-features 9880 level-3 cross-features

Copyright © 2014 Criteo Feature selection About 40 original features 780 level-2 cross-features 9880 level-3 cross-features Each feature contains many modalities

Copyright © 2014 Criteo Feature selection at scale Greedy methods score the variables independently Group sparsity methods score them jointly but are biased (think L1)

Copyright © 2014 Criteo Feature selection at scale Greedy methods score the variables independently Group sparsity methods score them jointly but are biased (think L1) Let us use greedy methods to provide a good initial set of features Refine with group sparsity

Copyright © 2014 Criteo Learning the parameters n = 10^9, p = 10^7 Theory tells us that stochastic gradient methods should be used

Copyright © 2014 Criteo Learning the parameters - The actual situation Tens of models are trained several times a day Warm starts favor batch methods Not all points are equal Stochastic methods are harder to parallelize.

Copyright © 2014 Criteo Criteo's optimizer

Copyright © 2014 Criteo Criteo's optimizer

Copyright © 2014 Criteo Dealing with spurious clicks Some clicks do not bring sales Some clicks are pure fake We can build a fraud detection system

Copyright © 2014 Criteo Dealing with spurious clicks Some clicks do not bring sales Some clicks are pure fake We can build a fraud detection system What is the real issue here?

Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales

Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales We have labeled data

Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales We have labeled data Let's build a sale prediction model.

Copyright © 2014 Criteo Are we done yet? We know how to build a good model We know how to train it We are predicting a meaningful target

Copyright © 2014 Criteo Simpson’s paradox CTROverall First position60/9000 (0.67%) Second position50/7000 (0.71%)

Copyright © 2014 Criteo Simpson’s paradox CTROverallLow-value usersHigh-value users First position60/9000 (0.67%)48/8000 (0.6%)12/1000 (1.2%) Second position50/7000 (0.71%)2/1000 (0.2%)48/6000 (0.8%)

Copyright © 2014 Criteo Simpson’s paradox CTROverallLow-value usersHigh-value users First position60/9000 (0.67%)48/8000 (0.6%)12/1000 (1.2%) Second position50/7000 (0.71%)2/1000 (0.2%)48/6000 (0.8%) Because of the underprediction, we might not detect our error.

Copyright © 2014 Criteo Offline model evaluation The data collection process depends on the current algorithm Changing the predictor changes the input distribution How can we predict what will happen?

Copyright © 2014 Criteo Evaluating the model A/B test Slow Costly

Copyright © 2014 Criteo Evaluating the model A/B test Slow Costly Counterfactual analysis: “What would have happened if…?” Based around importance sampling

Copyright © 2014 Criteo Counterfactual analysis Requires randomization in real time Requires the ability to replay what happened “Would I have taken another decision?” Extensive logging: ALL campaigns

Copyright © 2014 Criteo Using side information There is never enough data With more information, we could learn finer behaviours But there are only so many sales, can we do better?

Copyright © 2014 Criteo From unsupervised to weakly supervised learning Unsupervised learning tries to learn about X Weakly supervised learning uses related tasks Long visits on the website Sales which do not follow a click Big data: unstructured targets rather than inputs

Copyright © 2014 Criteo Summary for the prediction The technical constraints shape our solution Accuracy is not the only goal Scaling is much more than the quantity of data

Copyright © 2014 Criteo Summary for the prediction The technical constraints shape our solution Accuracy is not the only goal Scaling is much more than the quantity of data Which latest developments can we incorporate and benefit from?

Copyright © 2014 Criteo Product recommendation

Copyright © 2014 Criteo Product recommendation

Copyright © 2014 Criteo Two-stage approach Stage 1: Product preselection based on: Popularity Browsing history of the user

Copyright © 2014 Criteo Two-stage approach Stage 1: Product preselection based on: Popularity Browsing history of the user Stage 2: Exact scoring using a prediction model

Copyright © 2014 Criteo Major hurdles for product recommendation Products come and go There might be a sale (Black Friday) Complementary vs. similar products

Copyright © 2014 Criteo Other challenges Multiple products in a banner Interaction between products and layout Different timeframes for different products

Copyright © 2014 Criteo A first recipe for success There are many sources of success/failure It is often suboptimal to focus on one The first step to address each source is often manual

Copyright © 2014 Criteo Conclusion Prediction is at the core of our business Huge engineering constraints Different bottlenecks than in academia Build from the ground up, not the other way around

Copyright © 2014 Criteo This is the end Thank you! Questions?