Introduction to Machine Learning course 67577 fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Machine learning continued Image source:
An Overview of Machine Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Active Learning with Support Vector Machines
Evaluating Hypotheses
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Introduction to Machine Learning Approach Lecture 5.
INTRODUCTION TO Machine Learning 3rd Edition
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Crash Course on Machine Learning
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning CSE 681 CH2 - Supervised Learning.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Introduction.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Machine Learning Concept Learning General-to Specific Ordering
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Data Mining and Decision Support
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CSE 4705 Artificial Intelligence
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning for Computer Security
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Intro to Machine Learning
Computational Learning Theory
Perceptrons Lirong Xia.
CH. 1: Introduction 1.1 What is Machine Learning Example:
Machine Learning Basics
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Computational Learning Theory
Computational Learning Theory
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
NAÏVE BAYES CLASSIFICATION
Perceptrons Lirong Xia.
Presentation transcript:

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering Hebrew University

What is Machine Learning? Inference engine (computer program) that when given sufficient data (examples) computes a function that matches as close as possible the process generating the data. Make accurate prediction based on observed data Algorithms to optimize a performance criterion based on observed data Learning to do better in the future based on what was experienced in the past Programming by examples: instead of writing a program to solve a task directly, machine learning seeks methods by which the computer will come up with its own program based on training examples.

Why Machine Learning? Data-driven algorithms are able examine large amounts of data. A human expert on the other hand is likely to be guided by subjective impressions or by examining a relatively small number of examples. Humans often have trouble expressing what they know but have no difficulty in labeling data Machine learning is effective in domains where declarative (rule based) knowledge is difficult to obtain yet generating training data is easy

Typical Examples Visual recognition (say, detect faces in an image): the amount of variability in appearance introduce challenges that are beyond the capacity of direct programming Spam filtering: data-driven programming can adapt to changing tactics by spammers Extract topics from documents: categorize news articles whether they are about politics, sports, science, etc. Natural language understanding: from spoken words to text; categorize the meaning of spoken sentences Optical character recognition (OCR) Medical diagnosis: from symptoms to diagnosis Credit card transaction fraud detection Wealth prediction

Fundamental Issues Over-fitting: doing well on a training set does not guarantee accuracy on new examples What is the resource we wish to optimize? For a given accuracy, use the smallest size training set Examples are drawn from some (fixed) distribution D over X x Y (instance space x output space). Does the learner actually need to recover D during the learning process? How does the learning process depend on the complexity of the family of learning functions (concept class C)? How does one define complexity of C? When the goal is to learn the joint distribution D then the problem is computationally unwieldy because the joint distribution table is exponentially large. What assumptions can be made to simplify the task?

Multiclass classification. K=2 is normally of most interest. Supervised vs. Un-supervised Supervised Learning Models: where X is the instance (data) space and Y is the output space Regression. Predict the price of a used car given brand, year, mileage.. Kinematics of a robot arm; navigate by determining steering angle from image input.. Un-supervised Learning Models: Find regularities in the input data assuming there is some structure in the input space Density estimation Clustering (non-parametric density estimation): divide customers to groups which have similar attributes.. Latent class models: extract topics from documents Compression: represent the input space with fewer parameters; projection to lower-dimensional spaces

Notations X is the instance space: space from which observations are drawn. Examples, input instance, a single observation. Examples, Y is the output space: set of possible outcomes that can be associated with a measurement. Examples, An example is an instance-label pair (x,y). If |Y|=2 one typically uses {0,1} or {-1,1}. We say that an example (x,y) is positive if y=1 and otherwise we call it a negative example A training set Z consists of m instance-label pairs: In some cases we refer to the training set without labels:

Separating hyperplanes: a concept h(x) is specified by a vector and a scalar b such that: Conjunction learning: a conjunction is a special case of a Boolean formula. A literal Is a variable or its negation and a term is a conjunction of literals, i.e. A target function is a term which consists of a subset of literals. In this case and Each is called a concept or hypothesis or classifier. Example, if Notations A concept (hypothesis) class C is a set (not necessarily finite) of functions of the form: Other examples: then C might be: Decision trees: when then any boolean function can be described by a binary tree. Thus, C consists of decision trees ( )

The Formal Learning Model Probably Approximate Correct (PAC) Distribution invariant: Learner does not need to estimate the joint distribution D over X x Y. Assumptions are that examples arrive i.i.d. and that D exists and is fixed. The training sample complexity (size of the training set Z) depends only the desired accuracy and confidence parameters - does not depend on D. Not all concept classes D are PAC-learnable. But some interesting classes are.

Unrealizable case: when and the training set is and D is over XxY Realizable case: when a target concept is known to lie inside C. In this case, the training set is sampled randomly and independently (i.i.d) according to some (unknown) Distribution D, i.e., S is distributed according to the product distribution Given a concept function is the probability that an instance x sampled according to D will be labeled incorrectly by h(x) PAC Model Definitions

given to the learner specifies desired accuracy, i.e. Note: in realizable case because given to the learner specifies desired confidence, i.e. The learner is allowed to deviate occasionally from the desired accuracy but only rarely so.. PAC Model Definitions

We will say that an algorithm L learns C if for every and for every D over XxY, L generates a concept function such that the probability that is at least PAC Model Definitions

from the set of all training examples to C with the following property: given any there is an integer such that if then, for any probability distribution D on XxY, if Z is a training set of length m drawn randomly according to, then with probability of at least then hypothesis is such that Formal Definition of PAC Learning A learning algorithm L is a function: We say that C is learnable (or PAC-learnable) if there is a learning algorithm for C

Formal Definition of PAC Learning does not depend on D, i.e., PAC model is distribution invariant The class C determines the sample complexity. For “simple” classes would be small compared to more “complex” classes. Notes:

Course Syllabus 3 x PAC: 2 x Separating Hyperplanes: Support Vector Machine, Kernels, Linear Discriminant Analysis 3 x Unsupervised Learning: Dimensionality Reduction (PCA), Density Estimation, Non-parametric Clustering (spectral methods) 5 x Statistical Inference: Maximum Likelihood, Conditional Independence, Latent Class Models, Expectation-Maximization Algorithm, Graphical Models