Part I: Classification and Bayesian Learning

Slides:



Advertisements
Similar presentations
Machine Learning: Intro and Supervised Classification
Advertisements

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.
CH2 - Supervised Learning Computational learning theory
An Overview of Machine Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 1. Introduction. What is Machine Learning? Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Need.
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
The Nature of Statistical Learning Theory by V. Vapnik
Introduction to Predictive Learning
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Learning From Data Chichang Jou Tamkang University.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Data mining and statistical learning - lecture 13 Separating hyperplane.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
INTRODUCTION TO Machine Learning 3rd Edition
Classification and Prediction: Regression Analysis
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning CSE 681 CH2 - Supervised Learning.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Learning from Observations Chapter 18 Through
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Introduction.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Learning from observations
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Machine Learning Concept Learning General-to Specific Ordering
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CSE 4705 Artificial Intelligence
CH. 2: Supervised Learning
CH. 1: Introduction 1.1 What is Machine Learning Example:
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
INTRODUCTION TO Machine Learning
Computational Learning Theory
10701 / Machine Learning Today: - Cross validation,
Overview of Machine Learning
Computational Learning Theory
Machine learning overview
Supervised Learning Berlin Chen 2005 References:
CS639: Data Management for Data Science
Supervised machine learning: creating a model
Machine learning: What is it?
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

Part I: Classification and Bayesian Learning Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004

Machine Learning Machine Leaning is programming computers to optimize a perf criteria using example data or past experience Inference from samples There is a process that explains the data we observe. But we don’t know the details about how the data are generated. Internet requests, failure events, etc It’s hard to identify (model) the process completely, we could construct a good and useful approximation that detect certain patterns. Such patterns would help us to understand the process and make predictions about the future.

Types of Machine Learning Supervised learning is to create a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. Classification: Given an input, the output is Boolean (yes/no) to predict a class label of the input object; Regression: If the label is a numerical value, learn the function f(x) that best explain the input instance; Unsupervised learning: manual labels of inputs are not used. Clustering: partition a data set into subsets (clusters), so that the data in each subset share some common trait Semi-supervised learning: make use of both labeled and unlabeled data for training Reinforcement Learning Learning a policy: A sequence of outputs; No supervised output but delayed reward Examples: game playing, robot navigation

Supervised Learning Use of Supervised Learning Classification Regression Evaluation Methodology Bayesian Learning for Classification

Why Supervised Learning? Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

THEN low-risk ELSE high-risk Classification E.g: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings Rule-based prediction Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk

Learning a Class from Examples Given a set of examples of cars, with a label of “family car” or not according to a survey, class learning is to find a description that is shared by all positive examples. Use of the class info Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car?

Training set X Input representation Attributes: price & engine power Label of each instance

Most specific hypothesis, S Most general hypothesis, G Hypothesis Class: C Most specific hypothesis, S Most general hypothesis, G Learning is to find a particular hypothesis h to approximate C

Hypothesis h and Empirical Error Error of h:

Model Selection & Generalization Learning is an ill-posed problem: data is not sufficient to find a unique solution Limited number of sample data Some data might be noise due to imprecision in recording, labeling, or hidden (latent, unobservable) attributes that affect the label of instances The need for inductive bias: assumptions about class structureH Why rectangle, not circle or irregular shape? What’s degree of tightness of fitting? Generalization: How well a model performs on new data

Noise and Model Complexity Simple model is preferred Easy to use (check) (lower time complexity) Easy to train (lower space complexity) Easyto explain (more interpretable) Easy to generalize (lower variance ) Noise: any anomaly in the data which leads it infeasible to reach a zero-error classification with a simple hypothesis class

Probably Approximately Correct (PAC) Learning How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? Each strip is at most ε/4 Pr that we miss a strip 1‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4)N Pr that N instances miss 4 strips 4(1 ‒ ε/4)N 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x) 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

2-Class vs K-Class K-class problem be viewed as K 2-class problem: Train hypotheses hi(x), i =1,...,K:

Regression Examples x : car attributes y : price y = g (x | θ ) Price of a used car Speed of Top500 x : car attributes y : price y = g (x | θ ) g ( ) model, θ parameters Linear regression y = wx+w0

Basic Concepts Interpolation Extrapolation Regression Find a function that best fits a training set with no presence of noise r = f(x) Extrapolation Predict the output for any x, if x is NOT in the training set Regression Noise factor must be considered r = f(x) +  OR there’re hidden variables we couldn’t observe: r = f(x, z)

For a given test set, find g() that minimizes the empirical error Regression For a given test set, find g() that minimizes the empirical error

Underfitting vs Overfitting Underfitting: Hypothesis (H) less complex than actual model (C) Using a line to fit data sampled from a 3rd order polynomial Accuracy increases with more sample data; may not enough if the hypothesis is too complex Overfitting: H more complex than C Having more training data helps but only up to a certain point

Triple Trade-Off Trade-off between three factors : As N­, E¯ Complexity of the hypothesisH, c (H): capacity of the hypothesis class Training set size, N, Generalization error, E, on new examples As N­, E¯ As c (H)­, first E¯ and then E­ (The error of an over-complex hypothesis can be kept in check by increasing the amount of training data, but only up to a point)

Cross-Validation To estimate generalization error, we need data unseen during training. Three types of data in cross-validation: Training set (50%) Validation set (25%) Test (publication) set (25%) Resampling when there is few data

Dimensions of a Supervised Learner: Summary Model g() and parameter  Loss function L(): diff between desired output and approximation Optimization procedure: return the argument that minimizes