Direct or Remotely sensed

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Advertisements

CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Model generalization Test error Bias, variance and complexity
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Three kinds of learning
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
Machine Learning CS 165B Spring 2012
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Linear Regression. Fitting Models to Data Linear Analysis Decision Trees.
Trees Lives Temp>30° Lives Dies Temp
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Data Science Credibility: Evaluating What’s Been Learned
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Introduction to Machine Learning
Bagging and Random Forests
Robert Plant != Richard Plant
Introduction to Machine Learning and Tree Based Methods
Multiple Imputation using SOLAS for Missing Data Analysis
Chapter 13 – Ensembles and Uplift
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
How Good is a Model? How much information does AIC give us?
Trees Nodes Is Temp>30? False True Temp<=30° Temp>30°
ECE 5424: Introduction to Machine Learning
Ungraded quiz Unit 6.
A “Holy Grail” of Machine Learing
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
10701 / Machine Learning Today: - Cross validation,
Lecture 18: Bagging and Boosting
Uncertainty “God does not play dice”
Ensembles.
R & Trees There are two tree libraries: tree: original
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Support Vector Machine _ 2 (SVM)
Lecture 06: Bagging and Boosting
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
CS639: Data Management for Data Science
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Direct or Remotely sensed May be the same data Covariates Direct or Remotely sensed Predictors Remotely sensed Field Data Response, coordinates Sample Data Response, covariates Qualify, Prep Qualify, Prep Qualify, Prep Random split? Randomness Inputs Test Data Training Data Outputs Temp Data Processes Build Model Repeated Over and Over Randomize parameters The Model Statistics Validate Predict Randomness Predicted Values Uncertainty Maps Summarize Predictive Map

Direct or Remotely sensed May be the same data Covariates Direct or Remotely sensed Predictors Remotely sensed Jackknife Field Data Response, coordinates Sample Data Response, covariates Qualify, Prep Qualify, Prep Qualify, Prep Random split? Randomness Cross-Validation Noise Injection Inputs Test Data Training Data Outputs Temp Data Processes Build Model Repeated Over and Over Randomize parameters Monte-Carlo The Model Sensitivity Testing Randomness Statistics Validate Predict Noise Injection Predicted Values Uncertainty Maps Summarize Predictive Map

Cross-Validation Split the data into training (build model) and test (validate) data sets Leave-p-out cross-validation Validate on p samples, train on remainder Repeated for all combinations of p Non-exhaustive cross-validation Leave-p-out cross-validation but only on a subset of possible combinations Randomly splitting into 30% test and 70% training is common

K-fold Cross Validation Break the data into K sections Test on 𝐾 𝑖 , Training remainder Repeat for all 𝐾 𝑖 10-fold is common Test 1 2 3 4 Used in rpart() 5 Training 6 7 8 9 10

Bootstrapping Drawing N samples from the sample data (with replacement) Building the model Repeating the process over and over

Random Forest N samples drawn from the data with replacement Repeated to create many trees A “random forest” “Splits” are selected based on the most common splits in all the trees Bootstrap aggregation or “Bagging”

Boosting Can a set of weak learners create a single strong learner? (Wikipedia) Lots of “simple” trees used to create a really complex tree "convex potential boosters cannot withstand random classification noise,“ 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University)

Boosted Regression Trees BRTs combine thousands of trees to reduce deviance from the data Currently popular More on this later

Sensitivity Testing Injecting small amounts of “noise” into our data to see the effect on the model parameters. Plant The same approach can be used to model the impact of uncertainty on our model outputs and to make uncertainty maps Note: I call this noise injection to separate it from sensitivity testing for model parameters

Jackknifing Trying all combinations of covariates Given covariates 1,2,3 Try: 1 2 3 1,2 1,3 2,3 1,2, 3

Building Models Selecting the method Selecting the covariates/predictors (“Model Selection”) Optimizing the coefficients/parameters of the model 9bytez.com:Old School Hobbies: Building Models by Hand

Direct or Remotely sensed May be the same data Covariates Direct or Remotely sensed Predictors Remotely sensed Jackknife Field Data Response, coordinates Sample Data Response, covariates Qualify, Prep Qualify, Prep Qualify, Prep Random split? Randomness Cross-Validation Noise Injection Inputs Test Data Training Data Outputs Temp Data Processes Build Model Repeated Over and Over Randomize parameters Monte-Carlo The Model Sensitivity Testing Randomness Statistics Validate Predict Noise Injection Predicted Values Uncertainty Maps Summarize Predictive Map