Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Slides:



Advertisements
Similar presentations
Machine Learning: Intro and Supervised Classification
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Linear Classifiers (perceptrons)
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Indian Statistical Institute Kolkata
Classification of the aesthetic value of images based on histogram features By Xavier Clements & Tristan Penman Supervisors: Vic Ciesielski, Xiadong Li.
Text Categorization Hongning Wang Today’s lecture Bayes decision theory Supervised text categorization – General steps for text categorization.
Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Ensemble Learning: An Introduction
Analysis of Classification-based Error Functions Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006.
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Experimental Evaluation
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Experimental Evaluation of Learning Algorithms Part 1.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Agent-Based Hybrid Intelligent Systems and Their Dynamic Reconfiguration Zili Zhang Faculty of Computer and Information Science Southwest University
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Learning with AdaBoost
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Ensemble Methods in Machine Learning
Data Mining and Decision Support
Validation methods.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Model Discovery through Metalearning
An Empirical Comparison of Supervised Learning Algorithms
Boosted Augmented Naive Bayes. Efficient discriminative learning of
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Semi-supervised Machine Learning Gergana Lazarova
Basic machine learning background with Python scikit-learn
Introduction to Data Mining, 2nd Edition
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of Computer Science 2 February 2015

Presentation Title Department of Computer Science Machine Learning  Learn from past experience  Change their behavior without explicitly being programed  Optimization techniques  Maximize accuracy  Minimize error  Mine data 2

Presentation Title Department of Computer Science Machine Learning Example  I, Robot 3

Presentation Title Department of Computer Science Machine Learning 4

Presentation Title Department of Computer Science 5 Machine Learning WeightHeightBlood Press Temp 20578good bad mod99.5 Learning Algorithm Training Data WeightHeightBlood Press Temp 17267bad100.1 Has Disease yes no Has Disease ?

Presentation Title Department of Computer Science 6 Machine Learning

Presentation Title Department of Computer Science 7 Machine Learning WeightHeightBlood Press TempHas Disease 20578good98.2 Yes 15765bad100.7 Yes 18571mod99.5 No WeightHeightBlood Press Temp 17267bad100.1 Has Disease ? Data Set# Features# ClassesEntropy…# NodesLearning Rate…Accuracy Disease420.24…30.1…83.4 Iris430.76…70.2…97.4 Meta-data

Presentation Title Department of Computer Science 8 Meta-Learning Data Set# Features# ClassesEntropy…# NodesLearning Rate…Accuracy Disease420.24…30.1…83.4 Iris430.76…70.2…97.4 Data Set# Features# ClassesEntropy… Ecology1730.5… Meta-features

Presentation Title Department of Computer Science Meta-Learning 9

Presentation Title Department of Computer Science Meta-Learning 10

Presentation Title Department of Computer Science Previous Work 11  Random Search

Presentation Title Department of Computer Science Instance Hardness  Learning algorithms are generally evaluated at the data set level  Are some instances intrinsically hard to classify?  Why are some instances misclassified?  Are there instances which are misclassified that should not be?  Are some instances misclassified by all learning algorithms?  If so, why? 12

Presentation Title Department of Computer Science Data Set 13

Presentation Title Department of Computer Science Overfit 14

Presentation Title Department of Computer Science 15 Linear Classifier

Presentation Title Department of Computer Science 16 Detrimental Instances

Presentation Title Department of Computer Science Instance Hardness  Better intuition of learning algorithms and why instances are misclassified  Can learning algorithms be improved? Where?  Informed analysis of learning algorithm performance  Is the classification reasonable?  Where can the quality of the data be improved  Empirical analysis of the classification of 57 data sets by 9 learning algorithms  10-fold cross-validation  178,109 instances  5,310 models were created 17

Presentation Title Department of Computer Science Instance Hardness 18

Presentation Title Department of Computer Science Instance Hardness 19  9 learning algorithms  C4.5  MLP  RIPPER  NNge  Ridor  Unsupervised Meta-learning  Cluster learning algorithms based on diversity  Intuition for all of the algorithms in the cluster  5NN  Random Forest  LWL  Naïve Bayes

Presentation Title Department of Computer Science Existence of Instance Hardness 20  53% correctly classified by all algorithms  5% misclassified by all algorithms  Learning algorithms disagree on 42% of the instances  15% misclassified by the majority of algorithms

Presentation Title Department of Computer Science 21 Modeling Detrimental Instances  True class label is generally ignored  Regularization  Validation sets  Pruning

Presentation Title Department of Computer Science 22 Modeling Detrimental Instances

Presentation Title Department of Computer Science Instance Quality Learning 23

Presentation Title Department of Computer Science 24 Inequality Learning

Presentation Title Department of Computer Science Inequality Learning

Presentation Title Department of Computer Science Results: Original MLPC4.55-NNLWLNBNngeRandFRidorRip Orig QW-L p-val< < < < g,e,l47,0,532,0,2035,1,1628,10,1435,1,1620,1,2733,1,1831,1,1938,0,14 QW-B p-val< < < g,e,l49,0,337,1,1432,0,2022,12,1819,1,3221,1,2632,2,1834,1,1637,3,12 Filter p-val< < g,e,l39,0,1338,3,1138,4,1026,12,1436,1,1540,0,1233,1,1835,3,1440,2,10 26

Presentation Title Department of Computer Science Results: Original 27 MLPC4.55-NNLWLNBNngeRandFRidorRip Orig QW-L p-val< < < < g,e,l47,0,532,0,2035,1,1628,10,1435,1,1620,1,2733,1,1831,1,1938,0,14 QW-B p-val< < < g,e,l49,0,337,1,1432,0,2022,12,1819,1,3221,1,2632,2,1834,1,1637,3,12 Filter p-val< < g,e,l39,0,1338,3,1138,4,1026,12,1436,1,1540,0,1233,1,1835,3,1440,2,10

Presentation Title Department of Computer Science Inequality Learning 28  Increases the accuracy for all of the investigated learning algorithms  Advantage to using a continuous value rather than binary  Most effective in global learning algorithms such as backpropagation  Could be a side effect of how we integrated instance quality into the learning algorithm. (Future Work)  Focusing on the data, how does it compare with hyper-parameter optimization (HPO)?

Presentation Title Department of Computer Science Comparison of HPO and Filtering 29

Presentation Title Department of Computer Science K-Fold Cross-Validation  Create K partitions of the data set  For each partition, use as testing and remaining K-1 partitions for training 30

Presentation Title Department of Computer Science K-Fold Cross-Validation  Use a validation set to determine which set of hyper- parameters to use 31 Validation examples

Presentation Title Department of Computer Science Experimental Methodology 32  Hyper-parameter optimization  Bayesian Optimization (more than 512 hyper-parameter settings explored for most learning algorithms)  Standard uses the accuracy on a validation set  Optimistic uses the 10-fold cross-validation accuracy  Filtering  Ensemble Filter (L-Filter)  Removes instances that are misclassified by the majority of a set of learning algorithms  Adaptive Filter (A-Filter)  Greedy search among candidate learning algorithms

Presentation Title Department of Computer Science Results-Standard Approach VS OrigL-FilterHPO MLP44,1,747,0,5 C4.545,1,639,0,13 kNN44,2,641,2,9 NB42,0,1042,1,9 RF38,3,1137,2,13 RIP50,0,247,1,4

Presentation Title Department of Computer Science Results-Optimistic Approach 34 Not one filtering approach is best for all data sets and learning algorithms VS HPOL-FilterA-Filter MLP27,3,2245,0,7 C4.533,4,1548,2,2 kNN30,2,2051,0,1 NB22,2,2834,0,18 RF27,1,2446,0,6 RIP34,1,1748,0,4

Presentation Title Department of Computer Science Why does filtering have such a significant effect?  Recall: Maximize the probability of the hypothesis given the data  At the instance-level: 35

Presentation Title Department of Computer Science 36 Example Data Set

Presentation Title Department of Computer Science A Need for Better Understanding  Filter has a much higher potential than HPO  No principled examination 37

Presentation Title Department of Computer Science The Need for a Repository 38

Presentation Title Department of Computer Science The Need for a Repository 39

Presentation Title Department of Computer Science The Need for a Repository 40

Presentation Title Department of Computer Science Benefits of a Repository  Better science  Reproducible/saved results  Save time  Build reputation  Easier to compare with other work  Gives a snapshot of current state  Overall  Specific data set  Meta-learning  Provide data set 41

Presentation Title Department of Computer Science Machine Learning Results Repository 42

Presentation Title Department of Computer Science Machine Learning Results Repository 43 Data Set-Level Learning Algorithm -Level Instance-Level

Presentation Title Department of Computer Science Future Directions and Projects  MLRR  Data quality  Linking with papers  Creating user profiles  Anonymous postings for supplemental material  Meta-learning  Combine learning with optimization techniques  Meta-features  Deep learning  Collaborative filtering  Automate machine learning 44

Presentation Title Department of Computer Science Future Directions and Projects  Incorporate information into the learning process  Use cases of machine learning  How is machine learning actually used?  How can it be made easier to use?  Collaboration/application to other fields  Bioinformatics  Social media  Sports statistics 45

Presentation Title Department of Computer Science Thank you