Foundational Issues Machine Learning 726 Simon Fraser University
2 Outline Functions vs. Probabilities The Curse of Dimensionality Bishop: Ch. 1.
3 Learning Functions Much learning is about predicting the value of a function from a list of features. Classification: discrete function values. Regression: continuous function values. Mathematical Representation: map a feature vector x to a target value y i.e. f(x)=y. Oliver’s heel pain example. Often most intuitive to think in terms of function learning.
4 Why Probabilities Another view: the goal is learning the probability of an outcome. Advantages: Rank outcomes, quantify uncertainty. Deals with noisy data. Helps with combining predictions and pipelining. Can incorporate base rate information (e.g., only 10% of heel pain is caused by tumor). Can incorporate knowledge about inverse function, e.g., from diagnosis to symptom. Bayes’ theorem: single formula with base rates and inverse probabilities.
5 Why not probabilities Disadvantage: exact numbers may be hard to get, more than needed.
6 From Functions to Probabilities Function + noise = probability. See scatterplot, logistic regression.
7 From Probabilities to Functions Can model learning probability of y given x as function learning: f(x,y) = P(y|x). E.g., neural nets for computing probabilities.
8 The curse of dimensionality In many applications, we have an abundance of features. e.g., 20x20 image = 400 pixel values. Scaling standard ML methods to high-dimensional feature spaces is hard, both computationally and statistically. Statistics: data do not cover space. Typically only few of the possible data settings occur. manifold learning. learning aggregate, global, or high-level features. Unsupervised learning of feature hierarchies: deep learning. Discussion Question: does the brain do deep learning?