COMP6321 MACHINE LEARNING PROJECT PRESENTATION

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
國立雲林科技大學 National Yunlin University of Science and Technology Predicting adequacy of vancomycin regimens: A learning-based classification approach to improving.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Machine Learning IV Ensembles CSE 473. © Daniel S. Weld 2 Machine Learning Outline Machine learning: Supervised learning Overfitting Ensembles of classifiers.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Chapter 14 Simulation. Monte Carlo Process Statistical Analysis of Simulation Results Verification of the Simulation Model Computer Simulation with Excel.
Reduce Instrumentation Predictors Using Random Forests Presented By Bin Zhao Department of Computer Science University of Maryland May
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Ensemble Learning (2), Tree and Forest
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Validity and Reliability Dr. Voranuch Wangsuphachart Dept. of Social & Environmental Medicine Faculty of Tropical Medicine Mahodil University 420/6 Rajvithi.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Machine Learning CS 165B Spring 2012
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Find the probability and odds of simple events.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
CS 391L: Machine Learning: Ensembles
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
CpSc 810: Machine Learning Evaluation of Classifier.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
COMP24111: Machine Learning Ensemble Models Gavin Brown
Validation methods.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Physical Science and You Chapter One: Studying Physics and Chemistry Chapter Two: Experiments and Variables Chapter Three: Key Concepts in Physical Science.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Prediction and Perfect Samples. Probabilities and Expected Values Consider the color bowl in blue, green, red and yellow yellow. The proportion of each.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Combining Bagging and Random Subspaces to Create Better Ensembles
Machine Learning: Ensemble Methods
7. Performance Measurement
Machine Learning with Spark MLlib
Bagging and Random Forests
Inference: Conclusion with Confidence
Chapter 13 – Ensembles and Uplift
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Introduction to Data Mining, 2nd Edition by
The Nature of Probability and Statistics
Soft Error Detection for Iterative Applications Using Offline Training
Multiple Decision Trees ISQS7342
Building a predictive model to enhance students' self-driven engagement Moletsane Moletsane T: +27(0) | |
FORECASTING 16-Jan-19 Dr.B.Sasidhar.
Overfitting and Underfitting
Ensemble learning Reminder - Bagging of Trees Random Forest
Lecture 16. Classification (II): Practical Considerations
COSC 4368 Intro Supervised Learning Organization
Bootstrapping and Bootstrapping Regression Models
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
FORECASTING 11-Dec-19 Dr.B.Sasidhar.
Presentation transcript:

COMP6321 MACHINE LEARNING PROJECT PRESENTATION ANH TUAN, TRAN Msc. Computer Science Concordia University, Fall 2017

OUTLINE PROJECT OVERVIEW MACHINE LEARNING BOOTSTRAP

PROJECT OVERVIEW (1) ROLLING-SHIFT WORKERS’ LEVEL OF FATIGUE AFFECTED BY WORK SCHEDULE SLEEP PATTERNS LEVEL OF FATIGUE MEASURED BY (AMONG OTHERS) PVT (PSYCHOMOTOR VIGILANCE TEST) DATA COLLECTED BY SUBJECTIVE MEASURES: QUESTIONNAIRES (5 TIMES DAILY) OBJECTIVE MEASURES: ACTIWATCH (WEARABLE DEVICE) Sleep measurements in time series

PROJECT OVERVIEW (2) OBJECTIVES HOW WE DO IT? PREDICT THE LEVEL OF FATIGUE AS A RESULT OF SLEEP DEPRIVATION HOW WE DO IT? Decision Tree Random Forests

α MACHINE LEARNING # X Y 1 3 Model Cross-Validation Bootstrap Simple understanding? How good is α ? Is it good? Is it too good? # X Y 1 1.5 2 1.7 2.2 3 2.5 Model α Cross-Validation Bootstrap

α ∗1 α ∗2 α ∗b Bootstrap How-to? (1) # X Y 1 3 𝑍 ∗1 # X Y 1 3 # X Y 1 1.5 2 3 2.5 α ∗1 𝑍 ∗1 # X Y 1 1.5 2 1.7 2.2 3 2.5 # X Y 1 1.5 2 1.7 2.2 α ∗2 𝑍 ∗2 Original Data set (Z) # X Y 1 1.5 2 1.7 2.2 3 2.5 α ∗b 𝑍 ∗𝑏

Bootstrap How-to? (2) Draw a set of 𝑍 ∗ of same size from Z, with replacement Use 𝑍 ∗ to calculate an estimate α ∗ Repeat the process for a number of times (10.000+) We got B bootstrap data sets 𝑍 ∗1 , 𝑍 ∗2 ,…𝑍 ∗𝑏 and corresponding estimates α ∗1 , α ∗2 , … α ∗𝑏

Using Bootstrap in Error prediction Bootstrap data sets as training data Original sample as validation data Problems? Yes! Observations appear both in bootstrap AND validation data This will underestimate true prediction error

A little bit comparison (1) Data set 497 records, in 3 classes 479 in Green class 13 in Yellow class 5 in Red class Decision Tree gives: 93.8% accuracy 21 Green classified as Yellow 10 Green classified as Red

A little bit comparison (2) Random Forests gives: 96.8% accuracy 14 Green classified as Yellow 2 Green classified as Red Random Forests with Bootstrap gives: 99.2% accuracy 4 Green classified as Yellow 0 Green classified as Red