100+ Machine Learning Models running live: The approach

100+ Machine Learning Models running live: The approach
Lucas Bernardi - Principal Data Scientist

Mission to empower people to experience the world
28+ million reported listings 5.6+ million are homes, apartments and other unique places to stay 141+ thousands destinations 1.5+ million room nights/day Terabytes of data every day 200+ Machine Learning Models Deployed Mission to empower people to experience the world

Machine Learning

Machine Learning Personalization NLP Recommendations Metric Learning
Ranking Vision Perdiction

Why do we need 100s of models?

Continuous Learning.

Continuous Learning. 2 4 6 1 3 5 7 Turn the idea into a Hypothesis
Build a ML Model when necessary 4 Learn from Results 6 Product Team has an Idea 1 Design an Experiment 3 Run the Experiment 5 Repeat 7

Insight. About 30% of searches done by users travelling with kids have no information about children

Hypothesis : They forget their Children

Experiment.

How do we support the demand?

RS: A Central Repository for Machine Learning Models.
Deploy Discover Consume Monitor Data Scientists can easily deploy their models. Product teams can find new existing models and use them in their products. Developers invoke the model through a standard call to the Repository Monitor the health of the model in production

Diversity Gives us Strength Programming Languages
Libraries Backgrounds

Decouple Training from Prediction.

Decouple Training from Prediction: Lookup Tables
Table that maps Input to Predictions A request just requires a lookup Fast, Scalable, Reliable

Decouple Training from Prediction: Lookup Tables
Table that maps Input to Predictions A request just requires a lookup Fast, Scalable, Reliable Feature Space Complexity Training Flexibility / Model Complexity

Decouple Training from Prediction: Generalized Linear Models
Prediction(X) = F(<W, T(X)>)

Prediction(X) = F(<W, T(X)>) Learn W Naive Bayes Classifier Logistic Regression Linear SVM Linear Regression Poisson Regression Neg Binomial Regression Quantile Regression Beta Regression Bucketing Substitution Interaction Choose T: Identity: Continuous Regression Sigmoid: Probabilistic Classification Exponential: Discrete Regression Choose F:

Ranking(X) = arg sort i ∈ I <Wi, T(X,i)>

Ranking(X) = arg sort i ∈ I <Wi, T(X,i)> Learn Wi Softmax, One-vs-All, etc: Multiclass Classification Cost Sensitive Classification: Multilabel Classification Word2Vec, GloVe: Cosine / Euclidean k-Nearest Neighbours Matrix Factorization(s): Recommender Systems

Ranking(X) = arg sort i ∈ I <Wi, T(X,i)>
Decouple Training from Prediction: Generalized Linear Models Ranking(X) = arg sort i ∈ I <Wi, T(X,i)> Learn Wi Softmax, One-vs-All, etc: Multiclass Classification Cost Sensitive Classification: Multilabel Classification Word2Vec, GloVe: Cosine / Euclidean k-Nearest Neighbours Matrix Factorization(s): Recommender Systems Training Flexibility / Feature Space Complexity Model Complexity

Decouple Training from Prediction: Beyond
Tree Based Models Neural Networks Your unique awesome algorithm

Decouple Training from Prediction: Beyond
Tree Based Models Neural Networks Your unique awesome algorithm Model Complexity / Feature Space Complexity Training Flexibility

Second Challenge: Monitoring.
Missing Information Delayed Information Changing Information Labels are only available for a subsample of the population Labels are only available after weeks Labels and Feature Space distribution are not stationary

Response Distribution Analysis.
Histogram of the Probabilities the model outputs for each presented example

Response Distribution Analysis: An Omniscient Model
Is always right, and it knows it

Response Distribution Analysis: A Confused Model
Main Characteristics Single mode Central mode No stable point Potential Root Causes High Bayes Error

Response Distribution Analysis: An Overconfident Model
Main Characteristics Extreme mode Single mode High frequency mode Potential Root Causes Cold start Outliers Wrong Feature Scaling

Response Distribution Analysis: A Maybe-Good Model
Main Characteristics Bi-modal Smooth Wide Support Single stable point

Response Distribution Analysis: A Maybe-Good Model

Response Distribution Analysis: Too Good to be True

Response Distribution Analysis: Robots Travel for Leisure

Second Challenge: Monitoring.
Missing Information Delayed Information Changing Information Global Feedback Considers all examples, even those for which labels will never be available Immediate Feedback It can be computed as soon as the model makes predictions Responsive Feedback The histogram is very sensitive to changes

100+ Machine Learning Models Running Live.
Continuous Learning Centralized ML Repository Principled Tradeoffs Heuristics The approach

Thank you! Check out our blog! booking.ai

100+ Machine Learning Models running live: The approach

Similar presentations

Presentation on theme: "100+ Machine Learning Models running live: The approach"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

100+ Machine Learning Models running live: The approach

Similar presentations

Presentation on theme: "100+ Machine Learning Models running live: The approach"— Presentation transcript:

Similar presentations

About project

Feedback