Download presentation
Presentation is loading. Please wait.
Published byShayla Earnshaw Modified over 9 years ago
1
Copyright © 2014 Criteo 2014-11-20 15 millions de prédictions par seconde Les défis de Criteo Nicolas Le Roux Scientific Program Manager - R&D
2
Copyright © 2014 Criteo What does Criteo do? We buy advertising spaces on websites We display ads for our partners We get paid if the user clicks on the ad
3
Copyright © 2014 Criteo Retargeting
4
Copyright © 2014 Criteo In practice 1. A user lands on a webpage 2. The website contacts an ad-exchange 3. The ad-exchange contacts Criteo and its competitors 4. It is an auction: each competitor tells how much it bids 5. The highest bidder wins the right to display an ad
5
Copyright © 2014 Criteo Details of the auction Real-time bidding (RTB) Second-price auction: the winner pays the second highest price Optimal strategy: bid the expected gain Expected gain = price per click (CPC) * probability of click (CTR)
6
Copyright © 2014 Criteo Goal of the arbitrage Choose the appropriate advertiser Only win the profitable displays: it is a prediction problem An overbid means you are losing money An underbid means you are losing potential revenue
7
Copyright © 2014 Criteo What to do once we win the display? We are now directly in contact with the website Choose the best products Choose the color, the font and the layout Generate the banner
8
Copyright © 2014 Criteo Goal of the recommendation Increase the value of a display: it is a ranking problem The potential gains are huge
9
Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day
10
Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day
11
Copyright © 2014 Criteo Real-time constraints at Criteo More than 2 billion banners displayed per day
12
Copyright © 2014 Criteo Fitting within the real-time constraints To avoid timeouts, only simple models are allowed Logistic regression Decision trees What freedom do we have?
13
Copyright © 2014 Criteo Predicting the CTR
14
Copyright © 2014 Criteo Predicting the CTR
15
Copyright © 2014 Criteo A large number of parameters
16
Copyright © 2014 Criteo A large number of parameters
17
Copyright © 2014 Criteo A large number of parameters
18
Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information
19
Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life"
20
Copyright © 2014 Criteo Modeling higher-order information A linear model cannot represent higher order information E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life" We model these by creating "cross-features"
21
Copyright © 2014 Criteo Modeling higher-order information
22
Copyright © 2014 Criteo Hashing This model has estimation and computational issues
23
Copyright © 2014 Criteo Hashing
24
Copyright © 2014 Criteo Hashing
25
Copyright © 2014 Criteo Hashing
26
Copyright © 2014 Criteo Hashing
27
Copyright © 2014 Criteo Dealing with collisions
28
Copyright © 2014 Criteo Dealing with collisions
29
Copyright © 2014 Criteo Dealing with collisions
30
Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? »
31
Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? » Storage: #Banners/day x #Days x 4 = 480GB
32
Copyright © 2014 Criteo The cost of adding variables « Hey, I thought of this great variable: Time since last product view. Can we add it to the model? » Storage: #Banners/day x #Days x 4 = 480GB RAM: #Users x #Campaigns x 4 = 40GB
33
Copyright © 2014 Criteo Feature selection About 40 original features
34
Copyright © 2014 Criteo Feature selection About 40 original features 780 level-2 cross-features 9880 level-3 cross-features
35
Copyright © 2014 Criteo Feature selection About 40 original features 780 level-2 cross-features 9880 level-3 cross-features Each feature contains many modalities
36
Copyright © 2014 Criteo Feature selection at scale Greedy methods score the variables independently Group sparsity methods score them jointly but are biased (think L1)
37
Copyright © 2014 Criteo Feature selection at scale Greedy methods score the variables independently Group sparsity methods score them jointly but are biased (think L1) Let us use greedy methods to provide a good initial set of features Refine with group sparsity
38
Copyright © 2014 Criteo Learning the parameters n = 10^9, p = 10^7 Theory tells us that stochastic gradient methods should be used
39
Copyright © 2014 Criteo Learning the parameters - The actual situation Tens of models are trained several times a day Warm starts favor batch methods Not all points are equal Stochastic methods are harder to parallelize.
40
Copyright © 2014 Criteo Criteo's optimizer
41
Copyright © 2014 Criteo Criteo's optimizer
42
Copyright © 2014 Criteo Dealing with spurious clicks Some clicks do not bring sales Some clicks are pure fake We can build a fraud detection system
43
Copyright © 2014 Criteo Dealing with spurious clicks Some clicks do not bring sales Some clicks are pure fake We can build a fraud detection system What is the real issue here?
44
Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales
45
Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales We have labeled data
46
Copyright © 2014 Criteo Dealing with spurious clicks The real goal is to only buy clicks which bring sales We have labeled data Let's build a sale prediction model.
47
Copyright © 2014 Criteo Are we done yet? We know how to build a good model We know how to train it We are predicting a meaningful target
48
Copyright © 2014 Criteo Simpson’s paradox CTROverall First position60/9000 (0.67%) Second position50/7000 (0.71%)
49
Copyright © 2014 Criteo Simpson’s paradox CTROverallLow-value usersHigh-value users First position60/9000 (0.67%)48/8000 (0.6%)12/1000 (1.2%) Second position50/7000 (0.71%)2/1000 (0.2%)48/6000 (0.8%)
50
Copyright © 2014 Criteo Simpson’s paradox CTROverallLow-value usersHigh-value users First position60/9000 (0.67%)48/8000 (0.6%)12/1000 (1.2%) Second position50/7000 (0.71%)2/1000 (0.2%)48/6000 (0.8%) Because of the underprediction, we might not detect our error.
51
Copyright © 2014 Criteo Offline model evaluation The data collection process depends on the current algorithm Changing the predictor changes the input distribution How can we predict what will happen?
52
Copyright © 2014 Criteo Evaluating the model A/B test Slow Costly
53
Copyright © 2014 Criteo Evaluating the model A/B test Slow Costly Counterfactual analysis: “What would have happened if…?” Based around importance sampling
54
Copyright © 2014 Criteo Counterfactual analysis Requires randomization in real time Requires the ability to replay what happened “Would I have taken another decision?” Extensive logging: ALL campaigns
55
Copyright © 2014 Criteo Using side information There is never enough data With more information, we could learn finer behaviours But there are only so many sales, can we do better?
56
Copyright © 2014 Criteo From unsupervised to weakly supervised learning Unsupervised learning tries to learn about X Weakly supervised learning uses related tasks Long visits on the website Sales which do not follow a click Big data: unstructured targets rather than inputs
57
Copyright © 2014 Criteo Summary for the prediction The technical constraints shape our solution Accuracy is not the only goal Scaling is much more than the quantity of data
58
Copyright © 2014 Criteo Summary for the prediction The technical constraints shape our solution Accuracy is not the only goal Scaling is much more than the quantity of data Which latest developments can we incorporate and benefit from?
59
Copyright © 2014 Criteo Product recommendation
60
Copyright © 2014 Criteo Product recommendation
61
Copyright © 2014 Criteo Two-stage approach Stage 1: Product preselection based on: Popularity Browsing history of the user
62
Copyright © 2014 Criteo Two-stage approach Stage 1: Product preselection based on: Popularity Browsing history of the user Stage 2: Exact scoring using a prediction model
63
Copyright © 2014 Criteo Major hurdles for product recommendation Products come and go There might be a sale (Black Friday) Complementary vs. similar products
64
Copyright © 2014 Criteo Other challenges Multiple products in a banner Interaction between products and layout Different timeframes for different products
65
Copyright © 2014 Criteo A first recipe for success There are many sources of success/failure It is often suboptimal to focus on one The first step to address each source is often manual
66
Copyright © 2014 Criteo Conclusion Prediction is at the core of our business Huge engineering constraints Different bottlenecks than in academia Build from the ground up, not the other way around
67
Copyright © 2014 Criteo This is the end Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.