Machine Learning Introduction.

Slides:



Advertisements
Similar presentations
Godfather to the Singularity
Advertisements

Slides from: Doug Gray, David Poole
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
CSE 546 Data Mining Machine Learning Instructor: Pedro Domingos.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Part I: Classification and Bayesian Learning
1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CpSc 810: Machine Learning Design a learning system.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Learning from Observations Chapter 18 Through
Machine Learning.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Learning from observations
Instructor: Pedro Domingos
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Data Mining and Decision Support
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Introduction to Machine Learning, its potential usage in network area,
Machine Learning: Ensemble Methods
Machine Learning Models
Instructor: Pedro Domingos
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Interpretation and Perception
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CSE543: Machine Learning Lecture 2: August 6, 2014
Intro to Machine Learning
School of Computer Science & Engineering
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Introductory Seminar on Research: Fall 2017
CH. 1: Introduction 1.1 What is Machine Learning Example:
CSEP 546 Data Mining Machine Learning
CS 790 Machine Learning Introduction Ali Borji UWM.
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
Discrete Event Simulation - 4
Machine Learning: Lecture 3
Overview of Machine Learning
Why Machine Learning Flood of data
Machine Learning: Introduction
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Learning Chapter 18 and Parts of Chapter 20
Learning From Observed Data
Evaluating Classifiers
CS639: Data Management for Data Science
Evaluating Hypothesis
Machine Learning Chapter 2
Machine Learning: Decision Tree Learning
Machine Learning overview Chapter 18, 21
Version Space Machine Learning Fall 2018.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instructor: Vincent Conitzer
Machine Learning Chapter 2
Machine Learning: Lecture 5
Presentation transcript:

Machine Learning Introduction

Quotes “If you were a current computer science student what area would you start studying heavily?” Answer: Machine Learning. “The ultimate is computers that learn” Bill Gates, Reddit AMA “Machine learning is the next Internet” Tony Tether, Director, DARPA “Machine learning is today’s discontinuity” Jerry Yang, CEO, Yahoo

Comparison Traditional Programming Data Output Computer Machine Learning Computer Data Output Program Compare with Sorting Computer Data Output Program

Where does ML fit in?

Learning It is often hard to articulate the knowledge we need to build AI systems Often, we don’t even know it. Frequently, we can arrange to build systems that learn it themselves.

What is Learning The word "learning" has many different meanings. It is used, at least, to describe memorizing something learning facts through observation and exploration development of motor and/or cognitive skills through practice organization of new knowledge into general, effective representations

Learning Study of processes that lead to self-improvement of machine performance. It implies the ability to use knowledge to create new knowledge or integrating new facts into an existing knowledge structure Learning typically requires repetition and practice to reduce differences between observed and actual performance

What is Learning? Herbert Simon: “Learning is any process by which a system improves performance from experience.”

Learning Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.

Learning & Adaptation ”Modification of a behavioral tendency by expertise.” (Webster) ”A learning machine, broadly defined is any device whose actions are influenced by past experiences.” (Nilsson) ”Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population.” (Simon)

Negative Features of Human Learning Its slow (5-6 years for motor skills 12-20 years for abstract reasoning) Inefficient Expensive There is no copy process Learning strategy is often a function of knowledge available to learner

Applications of ML Learning to recognize spoken words SPHINX Learning to drive an autonomous vehicle ALVINN Learning to classify celestial objects Learning to play world-class backgammon TD-GAMMON Designing the morphology and control structure of electro-mechanical artefacts GOLEM

Motivating Problems Handwritten Character Recognition

Machine Translation

Speech Recognition

Motivating Problems Fingerprint Recognition (e.g., border control)

Example: object recognition f(x) giraffe giraffe giraffe llama llama llama X= f(x)=?

Motivating Problems Face Recognition (security access to buildings etc)

Application: social network analysis HP Labs email data 500 users, 20k connections evolving over time

Spam vs Regular Email vs (C) Dhruv Batra

Stock market 21

Weather prediction Temperature 22

Pose Estimation

induction If asked why we believe the sun will rise tomorrow, we shall naturally answer, 'Because it has always risen every day.' We have a firm belief that it will rise in the future, because it has risen in the past.

induction It has been argued that we have reason to know the future will resemble the past, because what was the future has constantly become the past, and has always been found to resemble the past, so that we really have experience of the future, namely of times which were formerly future, which we may call past futures. But such an argument really begs the very question at issue.

Different kinds of learning… Supervised learning: Someone gives us examples and the right answer for those examples We have to predict the right answer for unseen examples Unsupervised learning: We see examples but get no feedback We need to find patterns in the data Reinforcement learning: We take actions and get rewards Have to learn how to get high rewards Weakly or Semi-supervised learning Training data includes a few desired outputs

Tasks Supervised Learning Unsupervised Learning x Classification y x Discrete x Regression y Continuous Unsupervised Learning x Clustering y Discrete ID Dimensionality Reduction x y Continuous

Learning with a Teacher Supervised learning knowledge represented by a set of input-output examples (xi,yi) minimize the error between the actual response of the learner and the desired response desired response state x Environment Teacher actual response + Learning system - S error signal

Kinds of learning Supervised learning: Given a set of example input/output pairs, find a rule that does a good job of predicting the output associated with a new input. Let's say you are given the weights and lengths of a bunch of individual salmon fish, and the weights and lengths of a bunch of individual tuna fish. The job of a supervised learning system would be to find a predictive rule that, given the weight and length of a fish, would predict whether it was a salmon or a tuna.

Example of supervised learning: classification We lend money to people We have to predict whether they will pay us back or not People have various (say, binary) features: do we know their Address? do they have a Criminal record? high Income? Educated? Old? Unemployed? We see examples: (Y = paid back, N = not) +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N Next person is +a, -c, +i, -e, +o, -u. Will we get paid back?

Learning by Examples Sky Temp Humid Wind Water Fore-cast Enjoy Sport Concept: ”days on which my friend Aldo enjoys his favourite water sports” Task: predict the value of ”Enjoy Sport” for an arbitrary day based on the values of the other attributes Sky Temp Humid Wind Water Fore-cast Enjoy Sport Sunny Rainy Warm Cold Normal High Strong Cool Same Chane Yes No

Unsupervised Learning self-organized learning no teacher task independent quality measure identify regularities in the data and discover classes automatically state Environment Learning system

Clustering Data: Group similar things 33

Face Clustering iPhoto Picassa (C) Dhruv Batra

Kinds of learning Another, somewhat less well-specified, learning problem is clustering. Now you're given the descriptions of a bunch of different individual animals (or stars, or documents) in terms of a set of features (weight, number of legs, presence of hair, etc), and the job is to divide them into groups that "make sense". What makes this different from supervised learning is that we are not told in advance what groups the animals should be put into; just that we should find a natural grouping.

Reinforcement Learning Learning from feedback x Reinforcement Learning y Actions

Reinforcement Learning: Learning to act There is only one “supervised” signal at the end of the game. But you need to make a move at every step RL deals with “credit assignment”

Reinforcement learning Another learning problem, familiar to most of us, is learning motor skills, like riding a bike. We call this reinforcement learning. It's different from supervised learning because no-one explicitly tells you the right thing to do; you just have to try things and see what makes you fall over and what keeps you upright.

Learning a function One way to think about learning is that we are trying to find the definition of a function, given a bunch of examples of its input and output. Learning how to pronounce words can be thought of as finding a function from letters to sounds. Learning to recognize handwritten characters can be thought of as finding a function from collections of image pixels to letters. Learning to diagnose diseases can be thought of as finding a function from lab test results to disease categories. We can think of at least three different problems being involved: memory, averaging, and generalization.

The red and the black Imagine that we were given all these points, and we needed to guess a function of their x, y coordinates that would have one output for the red ones and a different output for the black ones.

What’s the right hypothesis? In this case, it seems like we could do pretty well by defining a line that separates the two classes.

Now, what’s the right hypothesis Now, what if we have a slightly different configuration of points? We can't divide them conveniently with a line.

Now, what’s the right hypothesis But this parabola-like curve seems like it might be a reasonable separator.

Design a Learning System We shall use handwritten Character recognition as an example to illustrate the design issues and approaches

Design a Learning System Step 0: Lets treat the learning system as a black box Learning System Z

Design a Learning System Step 1: Collect Training Examples (Experience). Without examples, our system will not learn (so-called learning from examples) 2 3 6 7 8 9

Design a Learning System Step 2: Representing Experience Choose a representation scheme for the experience/examples The sensor input represented by an n-d vector, called the feature vector, X = (x1, x2, x3, …, xn) (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1) 64-d Vector (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1) 64-d Vector

Design a Learning System Step 2: Representing Experience Choose a representation scheme for the experience/examples The sensor input represented by an n-d vector, called the feature vector, X = (x1, x2, x3, …, xn) To represent the experience, we need to know what X is. So we need a corresponding vector D, which will record our knowledge (experience) about X The experience E is a pair of vectors E = (X, D)

Design a Learning System Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) e.g, if X is digit 5, then d5=1; all others =0 If X is digit 9, then d9=1; all others =0

Design a Learning System Step 2: Representing Experience So, what would D be like? There are many possibilities. Assuming our system is to recognise 10 digits only, then D can be a 10-d binary vector; each correspond to one of the digits D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,1,0,0,0,0) X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector D= (0,0,0,0,0,0,0,0,1,0)

Design a Learning System Step 3: Choose a Representation for the Black Box We need to choose a function F to approximate the block box. For a given X, the value of F will give the classification of X. There are considerable flexibilities in choosing F Learning System F X F(X)

Design a Learning System Step 4: Learning/Adjusting the Weights We need a learning algorithm to adjust the weights such that the experience/prior knowledge from the training data can be learned into the system: E=(X,D) F(W,X) = D

Design a Learning System Step 4: Learning/Adjusting the Weights Adjust W E=(X,D) Learning System F(W) X F(W,X) D Error = D-F(W,X)

Design a Learning System Step 5: Use/Test the System Once learning is completed, all parameters are fixed. An unknown input X is presented to the system, the system computes its answer according to F(W,X) Learning System F(W) X F(W,X) Answer

Learning methods Decision rules: Bayesian network: Neural Network: If income < $30.000 then reject Bayesian network: P(good | income, credit history,….) Neural Network: Nearest Neighbor: Take the same decision as for the customer in the data base that is most similar to the applicant

Learning Methods One of the most popular learning algorithm makes hypotheses in the form of decision trees. In a decision tree, each node represents a question, and the arcs represent possible answers. We use all the data to build such a tree.

Decision Trees Hypotheses like this are nice because they're relatively easily interpretable by humans. So, in some cases, we run a learning algorithm on some data and then show the results to experts in the area (astronomers, physicians), and they find that the learning algorithm has found some regularities in their data that are of real interest to them.

Neural Networks They can represent complicated hypotheses in high-dimensional continuous spaces. They are attractive as a computational model because they are composed of many small computing units. They were motivated by the structure of neural systems in parts of the brain. Now it is understood that they are not an exact model of neural function, but they have proved to be useful from a purely practical perspective.

If…then rules If tear production rate = reduced then recommendation = none If age = young and astigmatic = no then recommendation = soft

Evaluating Inductive Hypotheses Accuracy of hypotheses on training data is obviously biased since the hypothesis was constructed to fit this data. Accuracy must be evaluated on an independent (usually disjoint) test set. The larger the test set is, the more accurate the measured accuracy and the lower the variance observed across different test sets.

Variance in Test Accuracy Let errorS(h) denote the percentage of examples in an independently sampled test set S of size n that are incorrectly classified by hypothesis h. Let errorD(h) denote the true error rate for the overall data distribution D. When n is at least 30, the central limit theorem ensures that the distribution of errorS(h) for different random samples will be closely approximated by a normal (Guassian) distribution. P(errorS(h)) errorS(h) errorD(h)

Projects Gesture Activated Interactive Assistant