Calibration from Probabilistic Classification Dr. Oscar Olmedo
Outline Why calibrate ML probabilities How to calibrate probabilities Platt’s method Isotonic Regression Histogram binning
What is Calibration About Many ML algorithms produce predicted probabilities that do not match empirical probabilities Learning well-calibrated models has not been as extensively research as compared to research into models that discriminate well Naeini, 2016
Why Calibrate Calibration is useful when probabilities of predictions are critical Reduced bias for model comparison People with asymmetric misclassification costs Examples: Finance Marketing Calibration may not always be necessary If only interested in rank ordering of predictions If only interested in an optimal split to get classes Naeini, 2016
ML algorithms and Calibration Known to produced will-calibrated probabilities Discriminant analysis Logistic regression Not so well-calibrated probabilities Naïve bayes SVM Tree methods Boosting Neural networks
How to calibrate Calibration is a post processing task Should not affect the rank of predictions, only numerical probability In a nutshell Split data into train and test Train ML model Calibrate on test set (3 methods discussed later) Final Model to get probabilities composed of ML model and calibration model
Platt’s method This method fits a sigmod to predicted values
Isotonic Regression Pricewise liner function assuming monotonically increasing function
Histogram binning Naeini, 2016
Effects of boosting Niculescu-Mizil & Caruana 2005
Comparison of methods Niculescu-Mizil & Caruana 2005
Platt’s method Niculescu-Mizil & Caruana 2005
Isotonic Regression Niculescu-Mizil & Caruana 2005
Visualizing Probabilities LetterRecognition dataset With R found in mlbench library Predict the letter “Z” 16 attributes based on pixels Reliability Plot
Applying Isotonic Regression After calibration
Future Work Research into multi-class calibration methods Research into non equal-size (or dynamic) histogram binning methods Research into ML methods that produce well-calibrated predictions
References Mahdi Pakdaman Naeini. OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION. Diss. University of Pittsburgh, 2017. Alexandru Niculescu-Mizil and Rich Caruana. "Predicting good probabilities with supervised learning." Proceedings of the 22nd international conference on Machine learning. ACM, 2005. Alexandru Niculescu-Mizil and Rich Caruana. "Obtaining Calibrated Probabilities from Boosting." UAI. 2005.
Part Two: Careers in Data Science
Marketing yourself Networking Meetups. There are a number ongoing in the DC area. Data Science DC, Spark, … Make business cards to hand out to people you meet Setup Linkedin account for an online presences This is where recruiters will look Post resume to online sites such as: indeed.com monster.com Follow up with recruiters
Tools and Expectations Knowledge of Statistics Machine learning Tools *SQL Python R Java Scala Spark, an open source library written in Scala for distributed computing Online courses are a good resource While a student take electives to build your bag of tools