Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward network Back-prop –Other types of networks
Components of a “Well-Posed” Learning Problem Task: the domain of the problem Experience: information about the domain Performance measure: a metric to judge how well the trained system can solve the problem Learner: a computer program whose performance on the task improves (according to the metric) with more experience
Example: Classification Task: Predict whether the user might like a movie or not Experience: database of movies the user has seen and the user’s ratings for them Performance Measure: percent of times the system correctly predicts the user’s preference
Example: Speech Recognition Task: take dictations from the user Experience: a collection of recordings of acoustic utterances with their transcriptions Performance Measure: percent of words correctly identified
Example: Function Modeling Task: approximate an unknown function f(x) Experience: a set of data points: {x i, f(x i )} Performance Measure: average error rate between f(x), the target function, and h(x), the function the system learned, over m test points e.g.
Designing a Learner Training experience –Kind of feedback? –A representative sample? –Learner has control? Target function –Specify expected behavior Function representation –Specify form and parameters Learning algorithm
Artificial Neural Networks Inspired by neurobiology A network is made up of massively interconnect “neurons” Good for some learning problems –Noisy training examples (contain errors) –Target function input can be best described by a vector (e.g., robot sensor data) –Target function is continuous (differentiable)
Perceptron w0w0 w1w1 wnwn 1 x1x1 xnxn O={-1,+1} O = g(In) = g(xw) = +1 : In > -1: otherwise n weighted inputs: In = w 0 +x 1 w 1 + x 2 w 2 + … + x n w n = x w An activation function, g(In) …
Training a Perceptron Quantify error –compare output with correct answer Update weights to minimize error is a constant, the learning rate
How Powerful Are Perceptrons? A perceptron can represent simple Boolean functions –AND, OR, NOT A network of perceptron can represent any Boolean function A perceptron cannot represent XOR –Why?
Linearly Separable Refer to pictures from R&N Fig. 19.9
Gradient Descent Guarantees convergence Approximates non-linearly separable functions Search through the weight space Define error as a continuous function of the weights
Multilayer Network x1x1 x2x2 x n … Input units Hidden units Output units uiui … ujuj w ij OjOj … w ni
Training a Multilayer Network Need to update weights to minimize error, but… –How to assign portions of “blame” to each weights fairly? –In a multilayer network, a weight may (eventually) contribute to multiple outputs –Need to back-propagate the error
Back-Propagation Between a hidden unit and an output unit: Between an input unit and a hidden unit:
Artificial Neural Network Summary Expressiveness: Can approximate any function of a set of attributes Computational efficiency: May take a long time to train to convergence Generalization: generalizes well Sensitivity to noise: very tolerant Transparency: can be used like a black box Prior knowledge: difficult to incorporate