Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: https://homes.cs.washington.edu/~pedrod/papers /cacm12.pdf reading.

Slides:



Advertisements
Similar presentations
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
SVM—Support Vector Machines
Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
Indian Statistical Institute Kolkata
CMPUT 466/551 Principal Source: CMU
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Sparse vs. Ensemble Approaches to Supervised Learning
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Methods: Bagging and Boosting
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning with Spark MLlib
Who am I? Work in Probabilistic Machine Learning Like to teach 
Chapter 7. Classification and Prediction
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Week 2 Presentation: Project 3
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Boosting and Additive Trees (2)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Pfizer HTS Machine Learning Algorithms: November 2002
Introduction to Data Science Lecture 7 Machine Learning Overview
Estimating Link Signatures with Machine Learning Algorithms
COMP61011 : Machine Learning Ensemble Models
Machine Learning Basics
Data Mining (and machine learning)
ECE 5424: Introduction to Machine Learning
A “Holy Grail” of Machine Learing
Machine Learning Week 1.
Data Mining Practical Machine Learning Tools and Techniques
The basic notions related to machine learning
Objective of This Course
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Instance Based Learning
Ensembles.
Artificial Intelligence Lecture No. 28
Overfitting and Underfitting
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Information Retrieval
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
CS639: Data Management for Data Science
A task of induction to find patterns
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CS249: Neural Language Model
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: https://homes.cs.washington.edu/~pedrod/papers /cacm12.pdf reading moderator: Ondrej Kaššák

INTRODUCTION Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. As more data becomes available, more ambitious problems can be tackled. The paper contains lot of “folk knowledge” that is needed to successfully develop machine learning applications, which is not readily available in ML textbooks.

LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION Representation – algorithm, implementation Evaluation – metric selection, results based on concrete data Optimization – from off-the-shelf do custom settings

IT’S GENERALIZATION THAT COUNTS Distinguish between train and test data! Reach good result on train data is easy To use as many data as possible use X-validation

DATA ALONE IS NOT ENOUGH Consider learning a Boolean function of (say) 100 variables from a million examples. There are 2100 − 106 examples whose classes you don’t know. The functions we want to learn in the real world are not drawn uniformly from the set of all mathematically possible functions! Similar examples have often similar classes, limited dependences, or limited complexity. This is often enough to do very well, and this is a large part of why machine learning has been so successful.

OVERFITTING HAS MANY FACES What if the knowledge and data we have are not sufficient to completely determine the correct classifier? Bias – e.g., linear predictors problem Variance – e.g., decision trees problem Strong false assumptions can be better than weak true ones, because a learner with the latter needs more data to avoid overfitting. X-validation, regularization

INTUITION FAILS IN HIGH DIMENSIONS Curse of dimensionality Correct generalizing becomes exponentially harder as the dimensionality (number of features) of the examples grows, because a fixed-size training set covers a dwindling fraction of the input space. When gathering more features, their benefits may be outweighed by the curse of dimensionality. For this reason add new attributes only if they bring new information (do not use highly correlatng features).

FEATURE ENGINEERING IS THE KEY I. The most important factor of ML project is the features used. With many independent features that each correlate well with the predicted class, learning is easy. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. ML is often the quickest part, but that’s because we’ve already mastered it pretty well! Feature engineering is more difficult because it’s domain-specific, while learners can be largely general-purpose.

FEATURE ENGINEERING IS THE KEY II. One of the holy grails of machine learning is to automate the feature engineering process. Today it is often done by automatically generating large numbers of candidate features and selecting the best by (say) their information gain with respect to the predicted class. Bear in mind that features that look irrelevant in isolation may be relevant in combination. On the other hand, running a learner with a very large number of features to find out which ones are useful in combination may be too time-consuming, or cause overfitting.

MORE DATA BEATS A CLEVERER ALGORITHM I. ML researchers are mainly concerned with creation of better learning algorithm, but pragmatically the quickest path to success is often to just get more data. This does bring up another problem, however: scalability. In most of computer science, the 2 main limited resources are time and memory. In ML, there is a 3rd one: training data. Bottleneck changed from data to time

MORE DATA BEATS A CLEVERER ALGORITHM II. Try the simplest learners first (e.g., naive Bayes before logistic regression, k-nearest neighbor before support vector machines). More sophisticated learners are seductive, but they are usually harder to use, because they have more knobs you need to turn to get good results. Learners can be divided into two major types: those whose representation has a fixed size, like linear classifiers, and those whose representation can grow with the data, like decision trees. Fixed-size learners can only take advantage of so much data. Variable-size learners can in principle learn any function given sufficient data, but in practice they may not, because of limitations of the algorithm (e.g., greedy search falls into local optima) or computational cost. Because of the curse of dimensionality, no existing amount of data may be enough. For this reason, clever algorithms — those that make the most of the data and computing resources available — often pay off in the end, provided you’re willing to put in the effort.

LEARN MANY MODELS, NOT JUST ONE Enasamble, ensamble, ensamble Bagging - the simplest technique, which simply generate random variations of the training set by resampling, learn a classifier on each, and combine the results by voting. This works because it greatly reduces variance while only slightly increasing bias. Boosting - training examples have weights, and these are varied so that each new classifier focuses on the examples the previous ones tended to get wrong. Stacking - the outputs of individual classifiers become the inputs of a “higher-level” learner that figures out how best to combine them.