Machine Learning Introduction.

Machine Learning Introduction

Quotes “If you were a current computer science student what area would you start studying heavily?” Answer: Machine Learning. “The ultimate is computers that learn” Bill Gates, Reddit AMA “Machine learning is the next Internet” Tony Tether, Director, DARPA “Machine learning is today’s discontinuity” Jerry Yang, CEO, Yahoo

Comparison Traditional Programming Data Output Computer
Machine Learning Computer Data Output Program Compare with Sorting Computer Data Output Program

Where does ML fit in?

Learning It is often hard to articulate the knowledge we need to build AI systems Often, we don’t even know it. Frequently, we can arrange to build systems that learn it themselves.

What is Learning The word "learning" has many different meanings. It is used, at least, to describe memorizing something learning facts through observation and exploration development of motor and/or cognitive skills through practice organization of new knowledge into general, effective representations

Learning Study of processes that lead to self-improvement of machine performance. It implies the ability to use knowledge to create new knowledge or integrating new facts into an existing knowledge structure Learning typically requires repetition and practice to reduce differences between observed and actual performance

What is Learning? Herbert Simon: “Learning is any process by which a system improves performance from experience.”

Learning Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.

Learning & Adaptation ”Modification of a behavioral tendency by expertise.” (Webster) ”A learning machine, broadly defined is any device whose actions are influenced by past experiences.” (Nilsson) ”Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population.” (Simon)

Negative Features of Human Learning
Its slow (5-6 years for motor skills years for abstract reasoning) Inefficient Expensive There is no copy process Learning strategy is often a function of knowledge available to learner

Applications of ML Learning to recognize spoken words
SPHINX Learning to drive an autonomous vehicle ALVINN Learning to classify celestial objects Learning to play world-class backgammon TD-GAMMON Designing the morphology and control structure of electro-mechanical artefacts GOLEM

Motivating Problems Handwritten Character Recognition

Machine Translation

Speech Recognition

Motivating Problems Fingerprint Recognition (e.g., border control)

Example: object recognition
f(x) giraffe giraffe giraffe llama llama llama X= f(x)=?

Motivating Problems Face Recognition (security access to buildings etc)

Application: social network analysis
HP Labs data 500 users, 20k connections evolving over time

Spam vs Regular vs (C) Dhruv Batra

Stock market 21

Weather prediction Temperature 22

Pose Estimation

induction If asked why we believe the sun will rise tomorrow, we shall naturally answer, 'Because it has always risen every day.' We have a firm belief that it will rise in the future, because it has risen in the past.

induction It has been argued that we have reason to know the future will resemble the past, because what was the future has constantly become the past, and has always been found to resemble the past, so that we really have experience of the future, namely of times which were formerly future, which we may call past futures. But such an argument really begs the very question at issue.

Different kinds of learning…
Supervised learning: Someone gives us examples and the right answer for those examples We have to predict the right answer for unseen examples Unsupervised learning: We see examples but get no feedback We need to find patterns in the data Reinforcement learning: We take actions and get rewards Have to learn how to get high rewards Weakly or Semi-supervised learning Training data includes a few desired outputs

Tasks Supervised Learning Unsupervised Learning x Classification y x
Discrete x Regression y Continuous Unsupervised Learning x Clustering y Discrete ID Dimensionality Reduction x y Continuous

Learning with a Teacher
Supervised learning knowledge represented by a set of input-output examples (xi,yi) minimize the error between the actual response of the learner and the desired response desired response state x Environment Teacher actual response + Learning system - S error signal

Kinds of learning Supervised learning: Given a set of example input/output pairs, find a rule that does a good job of predicting the output associated with a new input. Let's say you are given the weights and lengths of a bunch of individual salmon fish, and the weights and lengths of a bunch of individual tuna fish. The job of a supervised learning system would be to find a predictive rule that, given the weight and length of a fish, would predict whether it was a salmon or a tuna.

Example of supervised learning: classification
We lend money to people We have to predict whether they will pay us back or not People have various (say, binary) features: do we know their Address? do they have a Criminal record? high Income? Educated? Old? Unemployed? We see examples: (Y = paid back, N = not) +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N Next person is +a, -c, +i, -e, +o, -u. Will we get paid back?

Learning by Examples Sky Temp Humid Wind Water Fore-cast Enjoy Sport
Concept: ”days on which my friend Aldo enjoys his favourite water sports” Task: predict the value of ”Enjoy Sport” for an arbitrary day based on the values of the other attributes Sky Temp Humid Wind Water Fore-cast Enjoy Sport Sunny Rainy Warm Cold Normal High Strong Cool Same Chane Yes No

Unsupervised Learning
self-organized learning no teacher task independent quality measure identify regularities in the data and discover classes automatically state Environment Learning system

Clustering Data: Group similar things
33

Face Clustering iPhoto Picassa (C) Dhruv Batra

Kinds of learning Another, somewhat less well-specified, learning problem is clustering. Now you're given the descriptions of a bunch of different individual animals (or stars, or documents) in terms of a set of features (weight, number of legs, presence of hair, etc), and the job is to divide them into groups that "make sense". What makes this different from supervised learning is that we are not told in advance what groups the animals should be put into; just that we should find a natural grouping.

Reinforcement Learning
Learning from feedback x Reinforcement Learning y Actions

Reinforcement Learning: Learning to act
There is only one “supervised” signal at the end of the game. But you need to make a move at every step RL deals with “credit assignment”

Reinforcement learning
Another learning problem, familiar to most of us, is learning motor skills, like riding a bike. We call this reinforcement learning. It's different from supervised learning because no-one explicitly tells you the right thing to do; you just have to try things and see what makes you fall over and what keeps you upright.

Learning a function One way to think about learning is that we are trying to find the definition of a function, given a bunch of examples of its input and output. Learning how to pronounce words can be thought of as finding a function from letters to sounds. Learning to recognize handwritten characters can be thought of as finding a function from collections of image pixels to letters. Learning to diagnose diseases can be thought of as finding a function from lab test results to disease categories. We can think of at least three different problems being involved: memory, averaging, and generalization.

The red and the black Imagine that we were given all these points, and we needed to guess a function of their x, y coordinates that would have one output for the red ones and a different output for the black ones.

What’s the right hypothesis?
In this case, it seems like we could do pretty well by defining a line that separates the two classes.

Now, what’s the right hypothesis
Now, what if we have a slightly different configuration of points? We can't divide them conveniently with a line.

Now, what’s the right hypothesis
But this parabola-like curve seems like it might be a reasonable separator.

Design a Learning System
We shall use handwritten Character recognition as an example to illustrate the design issues and approaches

Step 0: Lets treat the learning system as a black box Learning System Z

Step 1: Collect Training Examples (Experience). Without examples, our system will not learn (so-called learning from examples) 2 3 6 7 8 9

Step 2: Representing Experience Choose a representation scheme for the experience/examples The sensor input represented by an n-d vector, called the feature vector, X = (x1, x2, x3, …, xn) (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1) 64-d Vector (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1) 64-d Vector

Step 2: Representing Experience Choose a representation scheme for the experience/examples The sensor input represented by an n-d vector, called the feature vector, X = (x1, x2, x3, …, xn) To represent the experience, we need to know what X is. So we need a corresponding vector D, which will record our knowledge (experience) about X The experience E is a pair of vectors E = (X, D)

Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) e.g, if X is digit 5, then d5=1; all others =0 If X is digit 9, then d9=1; all others =0

Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,1,0,0,0,0) X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,0,0,0,1,0)

Step 3: Choose a Representation for the Black Box We need to choose a function F to approximate the block box. For a given X, the value of F will give the classification of X. There are considerable flexibilities in choosing F Learning System F X F(X)

Step 4: Learning/Adjusting the Weights We need a learning algorithm to adjust the weights such that the experience/prior knowledge from the training data can be learned into the system: E=(X,D) F(W,X) = D

Step 4: Learning/Adjusting the Weights Adjust W E=(X,D) Learning System F(W) X F(W,X) D Error = D-F(W,X)

Step 5: Use/Test the System Once learning is completed, all parameters are fixed. An unknown input X is presented to the system, the system computes its answer according to F(W,X) Learning System F(W) X F(W,X) Answer

Learning methods Decision rules: Bayesian network: Neural Network:
If income < $ then reject Bayesian network: P(good | income, credit history,….) Neural Network: Nearest Neighbor: Take the same decision as for the customer in the data base that is most similar to the applicant

Learning Methods One of the most popular learning algorithm makes hypotheses in the form of decision trees. In a decision tree, each node represents a question, and the arcs represent possible answers. We use all the data to build such a tree.

Decision Trees Hypotheses like this are nice because they're relatively easily interpretable by humans. So, in some cases, we run a learning algorithm on some data and then show the results to experts in the area (astronomers, physicians), and they find that the learning algorithm has found some regularities in their data that are of real interest to them.

Neural Networks They can represent complicated hypotheses in high-dimensional continuous spaces. They are attractive as a computational model because they are composed of many small computing units. They were motivated by the structure of neural systems in parts of the brain. Now it is understood that they are not an exact model of neural function, but they have proved to be useful from a purely practical perspective.

If…then rules If tear production rate = reduced then recommendation = none If age = young and astigmatic = no then recommendation = soft

Evaluating Inductive Hypotheses
Accuracy of hypotheses on training data is obviously biased since the hypothesis was constructed to fit this data. Accuracy must be evaluated on an independent (usually disjoint) test set. The larger the test set is, the more accurate the measured accuracy and the lower the variance observed across different test sets.

Variance in Test Accuracy
Let errorS(h) denote the percentage of examples in an independently sampled test set S of size n that are incorrectly classified by hypothesis h. Let errorD(h) denote the true error rate for the overall data distribution D. When n is at least 30, the central limit theorem ensures that the distribution of errorS(h) for different random samples will be closely approximated by a normal (Guassian) distribution. P(errorS(h)) errorS(h) errorD(h)

Projects Gesture Activated Interactive Assistant

Machine Learning Introduction.

Similar presentations

Presentation on theme: "Machine Learning Introduction."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Introduction.

Similar presentations

Presentation on theme: "Machine Learning Introduction."— Presentation transcript:

Similar presentations

About project

Feedback