CS 790 Machine Learning Introduction Ali Borji UWM.

CS 790 Machine Learning Introduction Ali Borji UWM

Motivating Problems Handwritten Character Recognition

Motivating Problems Fingerprint Recognition (e.g., border control)

Motivating Problems Face Recognition (security access to buildings etc)

Can Machines Learn to Solve These Problems?
Or, to be more precise Can we program machines to learn to do these tasks?

Definition of Learning
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E (Mitchell, Machine Learning, McGraw-Hill, 1997)

Definition of Learning
What does this mean exactly? Handwriting recognition problem Task T: Recognizing hand written characters Performance measure P: percent of characters correctly classified Training experience E: a database of handwritten characters with given classifications

Design a Learning System
We shall use handwritten Character recognition as an example to illustrate the design issues and approaches

Step 0: Lets treat the learning system as a black box Learning System Z

Step 1: Collect Training Examples (Experience). Without examples, our system will not learn (so-called learning from examples) 2 3 6 7 8 9

Step 2: Representing Experience Choose a representation scheme for the experience/examples The sensor input represented by an n-d vector, called the feature vector, X = (x1, x2, x3, …, xn) (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1) 64-d Vector (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1) 64-d Vector

Step 2: Representing Experience Choose a representation scheme for the experience/examples The sensor input represented by an n-d vector, called the feature vector, X = (x1, x2, x3, …, xn) To represent the experience, we need to know what X is. So we need a corresponding vector D, which will record our knowledge (experience) about X The experience E is a pair of vectors E = (X, D)

Step 2: Representing Experience Choose a representation scheme for the experience/examples The experience E is a pair of vectors E = (X, D) So, what would D be like? There are many possibilities.

Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) e.g, if X is digit 5, then d5=1; all others =0 If X is digit 9, then d9=1; all others =0

Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,1,0,0,0,0) X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,0,0,0,1,0)

Step 3: Choose a Representation for the Black Box We need to choose a function F to approximate the block box. For a given X, the value of F will give the classification of X. There are considerable flexibilities in choosing F Learning System F X F(X)

Step 3: Choose a Representation for the Black Box F will be a function of some adjustable parameters, or weights, W = (w1, w2, w3, …wN), which the learning algorithm can modify or learn Learning System F(W) X F(W,X)

Step 4: Learning/Adjusting the Weights We need a learning algorithm to adjust the weights such that the experience/prior knowledge from the training data can be learned into the system: E=(X,D) F(W,X) = D

Step 4: Learning/Adjusting the Weights Adjust W E=(X,D) Learning System F(W) X F(W,X) D Error = D-F(W,X)

Step 5: Use/Test the System Once learning is completed, all parameters are fixed. An unknown input X is presented to the system, the system computes its answer according to F(W,X) Learning System F(W) X F(W,X) Answer

Revision of Some Basic Maths
Vector and Matrix Row vector/column vector/vector transposition Vector length/norm Inner/dot product Matrix (vector) multiplication Linear algebra Euclidean space Basic Calculus Partial derivatives Gradient Chain rule

Inner/dot product x = [x1, x1, …, xn ]T , y = [y1, y1, …, yn ]T Inner/dot product of x and y, xTy Matrix/Vector multiplication

Vector space/Euclidean space A vector space V is a set that is closed under finite vector addition and scalar multiplication. The basic example is n-dimensional Euclidean space, where every element is represented by a list of n real numbers An n-dimensional real vector corresponds to a point in the Euclidean space. [1, 3] is a point in 2-dimensional space [2, 4, 6] is point in 3-dimensional space

Vector space/Euclidean space Euclidean space (Euclidean distance) Dot/inner product and Euclidean distance Let x and y are two normalized n vectors, ||x||= 1, ||y||=1, we can write Minimization of Euclidean distance between two vectors corresponds to maximization of their inner product. Euclidean distance/inner product as similarity measure

Basic Calculus Multivariable function: Partial derivative: gives the direction and speed of change of y, with respect to xi Gradient Chain rule: Let y = f (g(x)), u = g(x), then Let z = f(x, y), x = g(t), y = h(t), then

Feature Space x1 x2 x2(i) x1(i)
Representing real world objects using feature vectors x2(i) x1(i) i 2 3 1 4 5 6 7 x1 x2 10 X(i) =[x1(i), x2(i)] 9 11 12 Feature Vector x1(i) 13 14 8 15 16 Feature Space Elliptical blobs (objects) x2(i)

Feature Space From Objects to Feature Vectors to Points in the Feature Spaces x1 x2 Elliptical blobs (objects) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 X(15) X(1) X(7) X(16) X(3) X(8) X(25) X(12) X(13) X(6) X(9) X(10) X(4) X(11) X(14)

Representing General Objects
Feature vectors of Faces Cars Fingerprints Gestures Emotions (a smiling face, a sad expression etc) …

Further Reading T. M. Mitchell, Machine Learning, McGraw-Hill International Edition, 1997 Chapter 1

Tutorial/Exercise Questions
Describe informally in one paragraph of English, the task of learning to recognize handwriting numerical digits. Describe the various steps involved in designing a learning system to perform the task of question 1, give as much detail as possible the tasks that have to be performed in each step. For the tasks of learning to recognize human faces and fingerprints respectively, redo questions 1 and 2. In the lecture, we used a very long binary vector to represent the handwriting digits, can you think of other representation methods?

What is Machine Learning?
Adapt to / learn from data To optimize a performance function Can be used to: Extract knowledge from data Learn tasks that are difficult to formalise Create software that improves over time

When to learn Learning involves
Human expertise does not exist (navigating on Mars) Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics) Learning involves Learning general models from data Data is cheap and abundant. Knowledge is expensive and scarce Customer transactions to computer behaviour Build a model that is a good and useful approximation to the data

Applications Speech and hand-writing recognition
Autonomous robot control Data mining and bioinformatics: motifs, alignment, … Playing games Fault detection Clinical diagnosis Spam detection Credit scoring, fraud detection Web mining: search engines Market basket analysis, Applications are diverse but methods are generic

Generic methods Learning from labelled data (supervised learning)
Eg. Classification, regression, prediction, function approx. Learning from unlabelled data (unsupervised learning) Eg. Clustering, visualisation, dimensionality reduction Learning from sequential data Eg. Speech recognition, DNA data analysis Associations Reinforcement Learning

Statistical Learning Machine learning methods can be unified within the framework of statistical learning: Data is considered to be a sample from a probability distribution. Typically, we don’t expect perfect learning but only “probably correct” learning. Statistical concepts are the key to measuring our expected performance on novel problem instances.

Induction and inference
Induction: Generalizing from specific examples. Inference: Drawing conclusions from possibly incomplete knowledge. Learning machines need to do both.

Inductive learning Data produced by “target”.
Hypothesis learned from data in order to “explain”, “predict”,“model” or “control” target. Generalisation ability is essential. Inductive learning hypothesis: “If the hypothesis works for enough data then it will work on new examples.”

Example 1: Hand-written digits
Data representation: Greyscale images Task: Classification (0,1,2,3…..9) Problem features: Highly variable inputs from same class including some “weird” inputs, imperfect human classification, high cost associated with errors so “don’t know” may be useful.

Example 2: Speech recognition
Data representation: features from spectral analysis of speech signals (two in this simple example). Task: Classification of vowel sounds in words of the form “h-?-d” Problem features: Highly variable data with same classification. Good feature selection is very important. Speech recognition is often broken into a number of smaller tasks like this.

Example 3: DNA microarrays
DNA from ~10000 genes attached to a glass slide (the microarray). Green and red labels attached to mRNA from two different samples. mRNA is hybridized (stuck) to the DNA on the chip and green/red ratio is used to measure relative abundance of gene products.

DNA microarrays Data representation: ~10000 Green/red intensity levels ranging from Tasks: Sample classification, gene classification, visualisation and clustering of genes/samples. Problem features: High-dimensional data but relatively small number of examples. Extremely noisy data (noise ~ signal). Lack of good domain knowledge.

Projection of 10000 dimensional data onto 2D using PCA
effectively separates cancer subtypes.

Probabilistic models A large part of the module will deal with methods
that have an explicit probabilistic interpretation: Good for dealing with uncertainty eg. is a handwritten digit a three or an eight ? Provides interpretable results Unifies methods from different fields

Text books E. Alpaydin’s “Introduction to Machine Learning”
T. Mitchell’s “Machine Learning”

Supervised Learning: Uses
Prediction of future cases Knowledge extraction Compression Outlier detection

Unsupervised Learning
Clustering: grouping similar instances Example applications Customer segmentation in CRM Learning motifs in bioinformatics Clustering items based on similarity Clustering users based on interests

Reinforcement Learning
Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agnts, partial observability

CS 790 Machine Learning Introduction Ali Borji UWM.

Similar presentations

Presentation on theme: "CS 790 Machine Learning Introduction Ali Borji UWM."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 790 Machine Learning Introduction Ali Borji UWM.

Similar presentations

Presentation on theme: "CS 790 Machine Learning Introduction Ali Borji UWM."— Presentation transcript:

Similar presentations

About project

Feedback