Calibration from Probabilistic Classification

Calibration from Probabilistic Classification
Dr. Oscar Olmedo

Outline Why calibrate ML probabilities How to calibrate probabilities
Platt’s method Isotonic Regression Histogram binning

What is Calibration About
Many ML algorithms produce predicted probabilities that do not match empirical probabilities Learning well-calibrated models has not been as extensively research as compared to research into models that discriminate well Naeini, 2016

Why Calibrate Calibration is useful when probabilities of predictions are critical Reduced bias for model comparison People with asymmetric misclassification costs Examples: Finance Marketing Calibration may not always be necessary If only interested in rank ordering of predictions If only interested in an optimal split to get classes Naeini, 2016

ML algorithms and Calibration
Known to produced will-calibrated probabilities Discriminant analysis Logistic regression Not so well-calibrated probabilities Naïve bayes SVM Tree methods Boosting Neural networks

How to calibrate Calibration is a post processing task
Should not affect the rank of predictions, only numerical probability In a nutshell Split data into train and test Train ML model Calibrate on test set (3 methods discussed later) Final Model to get probabilities composed of ML model and calibration model

Platt’s method This method fits a sigmod to predicted values

Isotonic Regression Pricewise liner function assuming monotonically increasing function

Histogram binning Naeini, 2016

Effects of boosting Niculescu-Mizil & Caruana 2005

Comparison of methods Niculescu-Mizil & Caruana 2005

Platt’s method Niculescu-Mizil & Caruana 2005

Isotonic Regression Niculescu-Mizil & Caruana 2005

Visualizing Probabilities
LetterRecognition dataset With R found in mlbench library Predict the letter “Z” 16 attributes based on pixels Reliability Plot

Applying Isotonic Regression
After calibration

Future Work Research into multi-class calibration methods
Research into non equal-size (or dynamic) histogram binning methods Research into ML methods that produce well-calibrated predictions

References Mahdi Pakdaman Naeini. OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION. Diss. University of Pittsburgh, 2017. Alexandru Niculescu-Mizil and Rich Caruana. "Predicting good probabilities with supervised learning." Proceedings of the 22nd international conference on Machine learning. ACM, 2005. Alexandru Niculescu-Mizil and Rich Caruana. "Obtaining Calibrated Probabilities from Boosting." UAI

Part Two: Careers in Data Science

Marketing yourself Networking
Meetups. There are a number ongoing in the DC area. Data Science DC, Spark, … Make business cards to hand out to people you meet Setup Linkedin account for an online presences This is where recruiters will look Post resume to online sites such as: indeed.com monster.com Follow up with recruiters

Tools and Expectations
Knowledge of Statistics Machine learning Tools *SQL Python R Java Scala Spark, an open source library written in Scala for distributed computing Online courses are a good resource While a student take electives to build your bag of tools

Calibration from Probabilistic Classification

Similar presentations

Presentation on theme: "Calibration from Probabilistic Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Calibration from Probabilistic Classification

Similar presentations

Presentation on theme: "Calibration from Probabilistic Classification"— Presentation transcript:

Similar presentations

About project

Feedback