Download presentation
Presentation is loading. Please wait.
Published byMorgan Fowler Modified over 9 years ago
1
Large-Scale Real-Time Product Recommendation at Criteo
Simon Dollé RecSys FR, December 1st, 2015
2
Catalog data Feed provided by the merchants User behavior data Large scale intent data All visits to merchant websites Page views, basket, sales events Ad display data Displayed and clicked ads
3
We buy Ad spaces
4
We buy Ad spaces We sell Clicks
5
We buy Ad spaces We sell Clicks that convert
6
We buy Ad spaces We sell Clicks that convert a lot
7
We buy Ad spaces We sell Clicks that convert a lot We take the risk
8
displays
9
displays leads to 50 clicks
10
displays leads to 50 clicks leads to 1 sale
11
3 billion ads/day 3 billion products
12
10ms to pick relevant products
13
7 data centers 15 000 servers 1200-node hadoop cluster
14
Catalog data 3B+ products Catalog data Feed provided by the merchants
User behavior data Large scale intent data All visits to merchant websites Page views, basket, sales events Ad display data Displayed and clicked ads
15
Catalog data Browsing history 3B+ products 2B events / day
Feed provided by the merchants User behavior data Large scale intent data All visits to merchant websites Page views, basket, sales events Ad display data Displayed and clicked ads
16
Catalog data Browsing history Ad display data 3B+ products
2B events / day Ad display data 20B events / day Catalog data Feed provided by the merchants User behavior data Large scale intent data All visits to merchant websites Page views, basket, sales events Ad display data Displayed and clicked ads
17
How do we do it ?
18
Recommend products for a user
What we want: reco(user) = products 1B users x 3B products ! But we need to scale and keep it fresh What we can do : Pre-select products offline Refine scoring online to get final candidates
19
Bob saw orange shoes
20
Bob saw orange shoes Some candidate products Historical
21
Bob saw orange shoes Some candidate products Historical Most viewed
22
Bob saw orange shoes Some candidate products Historical Most viewed
23
Bob saw orange shoes Some candidate products Historical Most viewed Similar
24
Bob saw orange shoes Some candidate products Historical Most viewed Similar
25
Bob saw orange shoes Some candidate products Historical Most viewed Similar Complementary
26
Recommendation Service
20K qps
27
HADOOP 20K qps Recommendation Service 50B Browsing history
Preselection computation Map-Reduce jobs 50B Browsing history
28
HADOOP 20K qps Recommendation Service Preselections 12h 500M 50B
Preselection computation Map-Reduce jobs 50B Browsing history
29
Online: sources Similarities Most viewed Most bought
30
Online: merge of products
Similarities Most viewed Most bought
31
ML model Logistic regression models because : They scale They are fast
They can handle lots of features Product-specific User-specific User-product interactions Display-specific Product-specific: price, category User-specific: usersegment, user last category User-product interactions: time since last view, category match Display-specific: desktop vs mobile
32
HADOOP 20K qps Recommendation Service Preselections 12h 500M 50B
Preselection computation Map-Reduce jobs 50B Browsing history
33
HADOOP 20K qps Recommendation Service Preselections 6h 12h 500M
Preselection computation Map-Reduce jobs Prediction models 50B Browsing history
34
HADOOP 20K qps Recommendation Service Display, Click, Sale logs
Preselections 6h 12h 500M HADOOP Preselection computation Map-Reduce jobs Prediction models 50B Browsing history
35
HADOOP 20K qps Recommendation Service Display, Click, Sale logs
Preselections 6h 12h 500M HADOOP Preselection computation Map-Reduce jobs Prediction models 50B Browsing history
36
Online: scoring Similarities Most viewed Most bought
0, , , , , , , , , , , ,007
37
Online: scoring Similarities Most viewed Most bought
0, , , , , , , , , , , ,004
38
Online: candidates -50% SHOP SHOP SHOP SHOP 0, , , , , , , , , , , ,004
39
What’s next ?
40
What’s next for us: Upcoming challenges
Long(er)-term user profiles
41
What’s next for us: Upcoming challenges
Long(er)-term user profiles More and better product information (images, semantic, NLP)
42
What’s next for us: Upcoming challenges
Long(er)-term user profiles More and better product information (images, semantic, NLP) Instant-update of similarities
43
What’s next for us: Upcoming challenges
Long(er)-term user profiles More and better product information (images, semantic, NLP) Instant-update of similarities Joint product scoring (score full banner and not products independently)
44
What’s next for you: Fancy a try?
On your own: We published datasets for click prediction 4GB display-click data: Kaggle challenge in 1TB Display-Click data (industry’s largest dataset): 4 billion of observations 156 billion feature-value available on Microsoft Azure used by edX (UC Berkeley) With us !
46
Questions?
47
s.dolle@criteo.com @simondolle @recsysfr
Thank you ! @simondolle @recsysfr Credits: Creative Stall, Gilbert Bages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.