Machine Learning Slides: Isabelle Guyon, Erik Sudderth, Mark Johnson, Derek Hoiem, Lana Lazebnik Photo: CMU Machine Learning Department protests G20.

Slides:

Advertisements

Similar presentations

The blue and green colors are actually the same.

Advertisements

Godfather to the Singularity

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Quiz 1 on Wednesday ~20 multiple choice or short answer questions

Indian Statistical Institute Kolkata

Recognition: A machine learning approach

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

CS157A Spring 05 Data Mining Professor Sin-Min Lee.

Part I: Classification and Bayesian Learning

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

CSE 185 Introduction to Computer Vision Pattern Recognition.

Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.

Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Data Mining and Decision Support

Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.

CSC 562: Final Project Dave Pizzolo Artificial Neural Networks.

A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.

CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

Introduction to Classification & Clustering Villanova University Machine Learning Lab Module 4.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Machine Learning with Spark MLlib

Machine Learning for Computer Security

Photo: CMU Machine Learning Department Protests G20

Introduction to Classification & Clustering

Machine Learning Crash Course

Applying Deep Neural Network to Enhance EMPI Searching

Linear Regression CSC 600: Data Mining Class 12.

Machine Learning overview Chapter 18, 21

A very brief introduction to R

School of Computer Science & Engineering

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Supervised Time Series Pattern Discovery through Local Importance

Data Mining 101 with Scikit-Learn

The Elements of Statistical Learning

CH. 2: Supervised Learning

Supervised Learning Seminar Social Media Mining University UC3M

Dipartimento di Ingegneria «Enzo Ferrari»,

Basic machine learning background with Python scikit-learn

Machine Learning & Data Science

Roberto Battiti, Mauro Brunato

Data Mining Practical Machine Learning Tools and Techniques

Machine Learning Crash Course

CS 2770: Computer Vision Intro to Visual Recognition

CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off

Hyperparameters, bias-variance tradeoff, validation

Overview of Machine Learning

CSCI N317 Computation for Scientific Applications Unit Weka

Course Introduction CSC 576: Data Mining.

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Intro to Machine Learning

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Lecture: Object Recognition

Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

Introduction to Neural Networks

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Machine Learning in Business John C. Hull

What is Artificial Intelligence?

Presentation transcript:

Machine Learning Slides: Isabelle Guyon, Erik Sudderth, Mark Johnson, Derek Hoiem, Lana Lazebnik Photo: CMU Machine Learning Department protests G20

Machine Learning is… A branch of artificial intelligence, concerns the construction and study of systems that can learn from data. Studies how to automatically learn to make accurate predictions based on past observations

Machine Learning is…

Machine Learning aka. data mining: machine learning applied to databases, i.e. collections of data inference and/or estimation in statistics pattern recognition in engineering signal processing in electrical engineering optimization

Supervised vs Unsupervised Learning In Supervised learning categories are known. In unsupervised learning, they are not, and the learning process attempts to discover the appropriate categories.

Supervised vs Unsupervised Learning

Classification… Spam Detection: Given email in an inbox, identify those email messages that are spam and those that are not. Credit Card Fraud Detection: Given credit card transactions for a customer in a month, identify those transactions that were made by the customer and those that were not. Evil/Good…

Classification framework f( ) = “apple” f( ) = “tomato” f( ) = “cow” Text message prediction Facebook friend suggestion Netflix Film prediction Slide credit: L. Lazebnik

y = f(x) Classification, cont. Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) output prediction function Image feature Slide credit: L. Lazebnik

The process Training Testing Training Labels Training Images Image Features Training Learned model Testing Image Features Learned model Prediction Test Image Slide credit: D. Hoiem and L. Lazebnik

Classifiers: Nearest neighbor Training examples from class 2 Training examples from class 1 Test example f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required! Slide credit: L. Lazebnik

Classifiers: Linear Find a linear function to separate the classes: f(x) = sgn(w  x + b) Slide credit: L. Lazebnik

Regression Data is labelled with a real value rather then a label (Numeric/Factor). Useful to predict time series data like the price of a stock over time. The decision being modelled is what value to predict for new unpredicted data. Learning a linear regression model means estimating the values of the coefficients used in the representation with the data that we have available.

Clustering (Data Mining) Data is not labelled. It can however be divided into groups based on similarity and other measures of natural structure in the data. Market segmentation is one of the most famously used example of cluster analysis.

Dimensionality Reduction Most algorithms works on columns (as variables) Datasets with thousands of variables makes the algorithms run slower. Important to reduce the number of columns in the data set while losing the smallest amount of information by doing so. Missing Values Ratio, Low Variance Filter, High Correlation Filter, PCA, Random Forests / Ensemble Trees, etc.

Test set (labels unknown) Generalization Training set (labels known) Test set (labels unknown) How well does a learned model generalize from the data it was trained on to a new test set? Slide credit: L. Lazebnik

Generalization Components of generalization error Bias: how much the average model over all training sets differ from the true model? Error due to inaccurate assumptions/simplifications made by the model Variance: how much models estimated from different training sets differ from each other Under fitting: model is too “simple” to represent all the relevant class characteristics High bias and low variance High training error and high test error Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data Low bias and high variance Low training error and high test error Slide credit: L. Lazebnik

Bias-Variance Trade-off Models with too few parameters are inaccurate because of a large bias (not enough flexibility). Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_bias_variance.htm Slide credit: D. Hoiem

Bias-Variance Trade-off E(MSE) = noise2 + bias2 + variance Error due to variance of training samples Unavoidable error Error due to incorrect assumptions See the following for explanations of bias-variance (also Bishop’s “Neural Networks” book): http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf From the link above: You can only get generalization through assumptions. Thus in order to minimize the MSE, we need to minimize both the bias and the variance. However, this is not trivial to do this. For instance, just neglecting the input data and predicting the output somehow (e.g., just a constant), would definitely minimize the variance of our predictions: they would be always the same, thus the variance would be zero—but the bias of our estimate (i.e., the amount we are off the real function) would be tremendously large. On the other hand, the neural network could perfectly interpolate the training data, i.e., it predict y=t for every data point. This will make the bias term vanish entirely, since the E(y)=f (insert this above into the squared bias term to verify this), but the variance term will become equal to the variance of the noise, which may be significant (see also Bishop Chapter 9 and the Geman et al. Paper). In general, finding an optimal bias-variance tradeoff is hard, but acceptable solutions can be found, e.g., by means of cross validation or regularization. Slide credit: D. Hoiem

Bias-variance tradeoff Underfitting Overfitting Complexity Low Bias High Variance High Bias Low Variance Error Test error Training error Slide credit: D. Hoiem

R Python Stata VBA and SQL Git and GitHub Toolkit R Python Stata VBA and SQL Git and GitHub

R Advantages Disadvantages Fast and free. State of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R! Highly customizable. Active user community Excellent for simulation, programming, computer intensive analyses, etc. A very brief introduction to R, M.Keller Not user friendly at start Steep learning curve, minimal GUI. Easy to make mistakes and not know. Working with large datasets is limited by RAM

Python Ease of use; interpreter AI Processing: Statistical Language with strong similarities to PERL, C but with powerful typing and object oriented features. Commonly used for producing HTML content on websites. Great for text files. Useful built-in types (lists, dictionaries). Clean syntax, powerful extensions Ease of use; interpreter AI Processing: Statistical Python has strong numeric processing capabilities: matrix operations, etc. Suitable for probability and machine learning code. Based on presentation from www.cis.upenn.edu/~cse391/cse391_2004/PythonIntro1.ppt

Stata Typically used in the areas of economics and politics. Friendly-user environment. Pretty easy to learn Ado files are available for extensions Impact Evaluation in Practice

VBA and SQL Visual Basic Applications and Structured Query Language both extensively used in BA. Ability to retrieve data stored in SQL format. Connections with R and Python possible and available. Easier to work with R and Python when SQL is mastered.

Version control system Git-GitHub Git Version control system GitHub Repository site

Git-GitHub Git-GitHub, Safa

References http://cs.stackexchange.com/questions/2907/what-exactly-is-the-difference-between-supervised-and-unsupervised-learning http://www.kdnuggets.com/2015/05/7-methods-data-dimensionality-reduction.html http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/ https://discuss.analyticsvidhya.com/t/difference-between-supervised-and-unsupervised-learning/1196 https://www.quora.com/Which-is-better-for-data-analysis-R-or-Python Git-Github, Safa A very brief introduction to R, Matthew Keller & Steven Boker Slide credit: D. Hoiem