Download presentation
Presentation is loading. Please wait.
Published byReynard McBride Modified over 6 years ago
1
Machine Learning Slides: Isabelle Guyon, Erik Sudderth, Mark Johnson, Derek Hoiem, Lana Lazebnik Photo: CMU Machine Learning Department protests G20
2
Machine Learning is… A branch of artificial intelligence, concerns the construction and study of systems that can learn from data. Studies how to automatically learn to make accurate predictions based on past observations
3
Machine Learning is…
4
Machine Learning aka. data mining: machine learning applied to databases, i.e. collections of data inference and/or estimation in statistics pattern recognition in engineering signal processing in electrical engineering optimization
5
Supervised vs Unsupervised Learning
In Supervised learning categories are known. In unsupervised learning, they are not, and the learning process attempts to discover the appropriate categories.
6
Supervised vs Unsupervised Learning
7
Classification… Spam Detection: Given in an inbox, identify those messages that are spam and those that are not. Credit Card Fraud Detection: Given credit card transactions for a customer in a month, identify those transactions that were made by the customer and those that were not. Evil/Good…
8
Classification framework
f( ) = “apple” f( ) = “tomato” f( ) = “cow” Text message prediction Facebook friend suggestion Netflix Film prediction Slide credit: L. Lazebnik
9
y = f(x) Classification, cont.
Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) output prediction function Image feature Slide credit: L. Lazebnik
10
The process Training Testing Training Labels Training Images
Image Features Training Learned model Testing Image Features Learned model Prediction Test Image Slide credit: D. Hoiem and L. Lazebnik
11
Classifiers: Nearest neighbor
Training examples from class 2 Training examples from class 1 Test example f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required! Slide credit: L. Lazebnik
12
Classifiers: Linear Find a linear function to separate the classes:
f(x) = sgn(w x + b) Slide credit: L. Lazebnik
13
Regression Data is labelled with a real value rather then a label (Numeric/Factor). Useful to predict time series data like the price of a stock over time. The decision being modelled is what value to predict for new unpredicted data. Learning a linear regression model means estimating the values of the coefficients used in the representation with the data that we have available.
14
Clustering (Data Mining)
Data is not labelled. It can however be divided into groups based on similarity and other measures of natural structure in the data. Market segmentation is one of the most famously used example of cluster analysis.
15
Dimensionality Reduction
Most algorithms works on columns (as variables) Datasets with thousands of variables makes the algorithms run slower. Important to reduce the number of columns in the data set while losing the smallest amount of information by doing so. Missing Values Ratio, Low Variance Filter, High Correlation Filter, PCA, Random Forests / Ensemble Trees, etc.
16
Test set (labels unknown)
Generalization Training set (labels known) Test set (labels unknown) How well does a learned model generalize from the data it was trained on to a new test set? Slide credit: L. Lazebnik
17
Generalization Components of generalization error
Bias: how much the average model over all training sets differ from the true model? Error due to inaccurate assumptions/simplifications made by the model Variance: how much models estimated from different training sets differ from each other Under fitting: model is too “simple” to represent all the relevant class characteristics High bias and low variance High training error and high test error Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data Low bias and high variance Low training error and high test error Slide credit: L. Lazebnik
18
Bias-Variance Trade-off
Models with too few parameters are inaccurate because of a large bias (not enough flexibility). Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Slide credit: D. Hoiem
19
Bias-Variance Trade-off
E(MSE) = noise2 + bias2 + variance Error due to variance of training samples Unavoidable error Error due to incorrect assumptions See the following for explanations of bias-variance (also Bishop’s “Neural Networks” book): From the link above: You can only get generalization through assumptions. Thus in order to minimize the MSE, we need to minimize both the bias and the variance. However, this is not trivial to do this. For instance, just neglecting the input data and predicting the output somehow (e.g., just a constant), would definitely minimize the variance of our predictions: they would be always the same, thus the variance would be zero—but the bias of our estimate (i.e., the amount we are off the real function) would be tremendously large. On the other hand, the neural network could perfectly interpolate the training data, i.e., it predict y=t for every data point. This will make the bias term vanish entirely, since the E(y)=f (insert this above into the squared bias term to verify this), but the variance term will become equal to the variance of the noise, which may be significant (see also Bishop Chapter 9 and the Geman et al. Paper). In general, finding an optimal bias-variance tradeoff is hard, but acceptable solutions can be found, e.g., by means of cross validation or regularization. Slide credit: D. Hoiem
20
Bias-variance tradeoff
Underfitting Overfitting Complexity Low Bias High Variance High Bias Low Variance Error Test error Training error Slide credit: D. Hoiem
21
R Python Stata VBA and SQL Git and GitHub
Toolkit R Python Stata VBA and SQL Git and GitHub
22
R Advantages Disadvantages
Fast and free. State of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R! Highly customizable. Active user community Excellent for simulation, programming, computer intensive analyses, etc. A very brief introduction to R, M.Keller Not user friendly at start Steep learning curve, minimal GUI. Easy to make mistakes and not know. Working with large datasets is limited by RAM
23
Python Ease of use; interpreter AI Processing: Statistical
Language with strong similarities to PERL, C but with powerful typing and object oriented features. Commonly used for producing HTML content on websites. Great for text files. Useful built-in types (lists, dictionaries). Clean syntax, powerful extensions Ease of use; interpreter AI Processing: Statistical Python has strong numeric processing capabilities: matrix operations, etc. Suitable for probability and machine learning code. Based on presentation from
24
Stata Typically used in the areas of economics and politics.
Friendly-user environment. Pretty easy to learn Ado files are available for extensions Impact Evaluation in Practice
25
VBA and SQL Visual Basic Applications and Structured Query Language both extensively used in BA. Ability to retrieve data stored in SQL format. Connections with R and Python possible and available. Easier to work with R and Python when SQL is mastered.
26
Version control system
Git-GitHub Git Version control system GitHub Repository site
27
Git-GitHub Git-GitHub, Safa
28
References Git-Github, Safa A very brief introduction to R, Matthew Keller & Steven Boker Slide credit: D. Hoiem
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.