Copyright © 2014 Criteo millions de prédictions par seconde Les défis de Criteo Nicolas Le Roux Scientific Program Manager - R&D
Copyright © 2014 Criteo What does Criteo do? We buy advertising spaces on websites We display ads for our partners We get paid if the user clicks on the ad
Copyright © 2014 Criteo Retargeting
Copyright © 2014 Criteo In practice 1. A user lands on a webpage 2. The website contacts an ad-exchange 3. The ad-exchange contacts Criteo and its competitors 4. It is an auction: each competitor tells how much it bids 5. The highest bidder wins the right to display an ad
Copyright © 2014 Criteo Details of the auction Real-time bidding (RTB) Second-price auction: the winner pays the second highest price Optimal strategy: bid the expected gain Expected gain = price per click (CPC) * probability of click (CTR)
Copyright © 2014 Criteo Goal of the arbitrage Choose the appropriate advertiser Only win the profitable displays: it is a prediction problem An overbid means you are losing money An underbid means you are losing potential revenue
Copyright © 2014 Criteo What to do once we win the display? We are now directly in contact with the website Choose the best products Choose the color, the font and the layout Generate the banner
Copyright © 2014 Criteo Goal of the recommendation Increase the value of a display: it is a ranking problem The potential gains are huge
Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day
Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day
Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day
Copyright © 2014 Criteo Fitting within the real-time constraints To avoid timeouts, only simple models are allowed Logistic regression Decision trees What freedom do we have?
Copyright © 2014 Criteo Predicting the CTR
Copyright © 2014 Criteo Predicting the CTR
Copyright © 2014 Criteo A large number of parameters
Copyright © 2014 Criteo A large number of parameters
Copyright © 2014 Criteo A large number of parameters
Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information
Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life"
Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life" We model these by creating "cross-features"
Copyright © 2014 Criteo Modeling higher-order information
Copyright © 2014 Criteo Hashing This model has estimation and computational issues
Copyright © 2014 Criteo Hashing
Copyright © 2014 Criteo Hashing
Copyright © 2014 Criteo Hashing
Copyright © 2014 Criteo Hashing
Copyright © 2014 Criteo Dealing with collisions
Copyright © 2014 Criteo Dealing with collisions
Copyright © 2014 Criteo Dealing with collisions
Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? »
Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? » Storage: #Banners/day x #Days x 4 = 480GB
Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? » Storage: #Banners/day x #Days x 4 = 480GB RAM: #Users x #Campaigns x 4 = 40GB
Copyright © 2014 Criteo Feature selection About 40 original features
Copyright © 2014 Criteo Feature selection About 40 original features 780 level-2 cross-features 9880 level-3 cross-features
Copyright © 2014 Criteo Feature selection About 40 original features 780 level-2 cross-features 9880 level-3 cross-features Each feature contains many modalities
Copyright © 2014 Criteo Feature selection at scale Greedy methods score the variables independently Group sparsity methods score them jointly but are biased (think L1)
Copyright © 2014 Criteo Feature selection at scale Greedy methods score the variables independently Group sparsity methods score them jointly but are biased (think L1) Let us use greedy methods to provide a good initial set of features Refine with group sparsity
Copyright © 2014 Criteo Learning the parameters n = 10^9, p = 10^7 Theory tells us that stochastic gradient methods should be used
Copyright © 2014 Criteo Learning the parameters - The actual situation Tens of models are trained several times a day Warm starts favor batch methods Not all points are equal Stochastic methods are harder to parallelize.
Copyright © 2014 Criteo Criteo's optimizer
Copyright © 2014 Criteo Criteo's optimizer
Copyright © 2014 Criteo Dealing with spurious clicks Some clicks do not bring sales Some clicks are pure fake We can build a fraud detection system
Copyright © 2014 Criteo Dealing with spurious clicks Some clicks do not bring sales Some clicks are pure fake We can build a fraud detection system What is the real issue here?
Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales
Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales We have labeled data
Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales We have labeled data Let's build a sale prediction model.
Copyright © 2014 Criteo Are we done yet? We know how to build a good model We know how to train it We are predicting a meaningful target
Copyright © 2014 Criteo Simpson’s paradox CTROverall First position60/9000 (0.67%) Second position50/7000 (0.71%)
Copyright © 2014 Criteo Simpson’s paradox CTROverallLow-value usersHigh-value users First position60/9000 (0.67%)48/8000 (0.6%)12/1000 (1.2%) Second position50/7000 (0.71%)2/1000 (0.2%)48/6000 (0.8%)
Copyright © 2014 Criteo Simpson’s paradox CTROverallLow-value usersHigh-value users First position60/9000 (0.67%)48/8000 (0.6%)12/1000 (1.2%) Second position50/7000 (0.71%)2/1000 (0.2%)48/6000 (0.8%) Because of the underprediction, we might not detect our error.
Copyright © 2014 Criteo Offline model evaluation The data collection process depends on the current algorithm Changing the predictor changes the input distribution How can we predict what will happen?
Copyright © 2014 Criteo Evaluating the model A/B test Slow Costly
Copyright © 2014 Criteo Evaluating the model A/B test Slow Costly Counterfactual analysis: “What would have happened if…?” Based around importance sampling
Copyright © 2014 Criteo Counterfactual analysis Requires randomization in real time Requires the ability to replay what happened “Would I have taken another decision?” Extensive logging: ALL campaigns
Copyright © 2014 Criteo Using side information There is never enough data With more information, we could learn finer behaviours But there are only so many sales, can we do better?
Copyright © 2014 Criteo From unsupervised to weakly supervised learning Unsupervised learning tries to learn about X Weakly supervised learning uses related tasks Long visits on the website Sales which do not follow a click Big data: unstructured targets rather than inputs
Copyright © 2014 Criteo Summary for the prediction The technical constraints shape our solution Accuracy is not the only goal Scaling is much more than the quantity of data
Copyright © 2014 Criteo Summary for the prediction The technical constraints shape our solution Accuracy is not the only goal Scaling is much more than the quantity of data Which latest developments can we incorporate and benefit from?
Copyright © 2014 Criteo Product recommendation
Copyright © 2014 Criteo Product recommendation
Copyright © 2014 Criteo Two-stage approach Stage 1: Product preselection based on: Popularity Browsing history of the user
Copyright © 2014 Criteo Two-stage approach Stage 1: Product preselection based on: Popularity Browsing history of the user Stage 2: Exact scoring using a prediction model
Copyright © 2014 Criteo Major hurdles for product recommendation Products come and go There might be a sale (Black Friday) Complementary vs. similar products
Copyright © 2014 Criteo Other challenges Multiple products in a banner Interaction between products and layout Different timeframes for different products
Copyright © 2014 Criteo A first recipe for success There are many sources of success/failure It is often suboptimal to focus on one The first step to address each source is often manual
Copyright © 2014 Criteo Conclusion Prediction is at the core of our business Huge engineering constraints Different bottlenecks than in academia Build from the ground up, not the other way around
Copyright © 2014 Criteo This is the end Thank you! Questions?