Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.

Using decision trees and their ensembles for analysis of NIR spectroscopic data
WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution

Outline S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions bpimediagroup.com S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Why decision trees? Why not?
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

But why decision trees? Kaggle CEO and Founder Anthony Goldbloom:
”…in the history of Kaggle competitions, there are only two Machine Learning approaches that win competitions: Handcrafted and Neural Networks” ”…It used to be random forest that was the big winner, but over the last six months a new algorithm called XGboost has cropped up, and it’s winning practically every competition in the structured data category” S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Why NIR spectroscopic data?
When a linear regression can be better that the decision trees methods? when relationship between X and y is fully linear when there is a very large number of features with low S/N ratio when covariate shift is likely S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

What decision trees are?
Drinks beer? yes no Knows statistics? Not chemometrician Chemometrician Steals ideas from statisticians? Chemometrician Not chemometrician S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Decision trees for numeric variables

Where are other variables? On every split the best variable is used Number of splits (tree depth) is limited Efficiency of split is a reduction of misclassification errors S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

How many splits? Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

How many splits? –50% Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits 50 –44% 6 –2% 4 S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

How many splits? –50% Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits 50% –44% 6% –2% 4% Use cross-validation to calculate the errors S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Decision trees regression
Variable importance Is calculated for each variable individually Take s into account the role of a variable in different splits Is accumulated across all splits and normalized S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Decision trees regression
Response variable is split into several bins Minimize variance in each node S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions viapesnyary.ru S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Decision trees ensembles
Ensemble learning — combine several models together A group of week learners can perform better when together decrease variance, make prediction more stable and reliable S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Decision trees ensembles
Bagging Create N random subsets (sampling with replacement) Train model for every subset (parallel) Use simple average for prediction Random forest Boosting Train a model from a random subset Make N better models by using new subsets (sequential) Use weighted average for prediction Randomly with replacement Gradient boosting S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Prediction of fat content in chopped meat samples by NIR spectra
Tecator Prediction of fat content in chopped meat samples by NIR spectra 100 predictors (NIR spectra by Tecator Infratec Food and Feed Analyzer, 850–1050 nm) 215 measurements (172 for calibration and 43 for test) S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Single tree — predictions

Single tree — the tree S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Single tree — variable importance

Single tree — variable selection

Random forest — predictions

Random forests — importance of variables

Random forests — variable selection

Olives S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Single tree — the tree and splits

Single tree — classification

Random forest — classification

Variable importance S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Conclusions ”The bottom line is: You can spend 3 hours playing with the data, generating features and interaction variables and get a 77% r-squared; and I can “from sklearn.ensemble import RandomForestRegressor” and in 3 minutes get an 82% r-squared.” S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

IASIM-2016 S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

IASIM-2018, June 17-20 2018, Seattle, WA, USA
12 March — for student scholarship April 5 — for abstract S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.

Similar presentations

Presentation on theme: "Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.

Similar presentations

Presentation on theme: "Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution."— Presentation transcript:

Similar presentations

About project

Feedback