CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.

Slides:



Advertisements
Similar presentations
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Advertisements

Unsupervised Learning
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Data Mining Classification: Alternative Techniques
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Data Mining Techniques Outline
DATA MINING Introductory and Advanced Topics Part I
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Visual Recognition Tutorial
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CIS 674 Introduction to Data Mining
Chapter 5 Data mining : A Closer Look.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
1 DATA MINING Source : Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Classification Today: Basic Problem Decision Trees.
1 DATA MINING Introductory and Advanced Topics Part I References from Dunham.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CSE 4705 Artificial Intelligence
DATA MINING CSE 8331 Spring 2002 Part I
Classification of unlabeled data:
Data Mining Lecture 11.
DATA MINING Introductory and Advanced Topics Part I
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
DATA MINING Introductory and Advanced Topics Part I
DATA MINING Introductory and Advanced Topics Part II - Clustering
Overview of Machine Learning
DATA MINING Introductory and Advanced Topics Part I
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
LECTURE 23: INFORMATION THEORY REVIEW
DATA MINING Introductory and Advanced Topics Part I
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
DATA MINING Source : Margaret H. Dunham
Presentation transcript:

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Some Slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, Other slides from CS 545 at Colorado State University, Chuck Anderson

2CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

3CSE 5331/7331 F'07 The slides in this introductory section are from CS545: Machine Learning By Chuck Anderson Department of Computer Science Colorado State University Fall 2006

4CSE 5331/7331 F'07 What is Machine Learning? Statistics ≈ the science of inference from data Machine learning ≈ multivariate statistics + computational statistics Multivariate statistics ≈ prediction of values of a function assumed to underlie a multivariate dataset Computational statistics ≈ computational methods for statistical problems (aka statistical computation) + statistical methods which happen to be computationally intensive Data Mining ≈ exploratory data analysis, particularly with massive/complex datasets

5CSE 5331/7331 F'07 Kinds of Learning Learning algorithms are often categorized according to the amount of information provided: Least Information: – –Unsupervised learning is more exploratory. – –Requires samples of inputs. Must find regularities. More Information: – –Reinforcement learning most recent. – –Requires samples of inputs, actions, and rewards or punishments. Most Information: – –Supervised learning is most common. – –Requires samples of inputs and desired outputs.

6CSE 5331/7331 F'07 Examples of Algorithms Supervised learning – –Regression » »multivariate regression » »neural networks and kernel methods – –Classification » »linear and quadratic discrimination analysis » »k-nearest neighbors » »neural networks and kernel methods Reinforcement learning – –multivariate regression – –neural networks Unsupervised learning – –principal components analysis – –k-means clustering – –self-organizing networks

7CSE 5331/7331 F'07

8

9

10CSE 5331/7331 F'07

11CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

© Prentice Hall12CSE 5331/7331 F'07 Point Estimation Point Estimate: estimate a population parameter. Point Estimate: estimate a population parameter. May be made by calculating the parameter for a sample. May be made by calculating the parameter for a sample. May be used to predict value for missing data. May be used to predict value for missing data. Ex: Ex: –R contains 100 employees –99 have salary information –Mean salary of these is $50,000 –Use $50,000 as value of remaining employee’s salary. Is this a good idea?

© Prentice Hall13CSE 5331/7331 F'07 Estimation Error Bias: Difference between expected value and actual value. Bias: Difference between expected value and actual value. Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Why square? Why square? Root Mean Square Error (RMSE) Root Mean Square Error (RMSE)

© Prentice Hall14CSE 5331/7331 F'07 Jackknife Estimate Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Ex: estimate of mean for X={x, …, x} Ex: estimate of mean for X={x 1, …, x n }

© Prentice Hall15CSE 5331/7331 F'07 Maximum Likelihood Estimate (MLE) Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Maximize L. Maximize L.

© Prentice Hall16CSE 5331/7331 F'07 MLE Example Coin toss five times: {H,H,H,H,T} Coin toss five times: {H,H,H,H,T} Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: However if the probability of a H is 0.8 then: However if the probability of a H is 0.8 then:

© Prentice Hall17CSE 5331/7331 F'07 MLE Example (cont’d) General likelihood formula: General likelihood formula: Estimate for p is then 4/5 = 0.8 Estimate for p is then 4/5 = 0.8

© Prentice Hall18CSE 5331/7331 F'07 Expectation-Maximization (EM) Solves estimation with incomplete data. Solves estimation with incomplete data. Obtain initial estimates for parameters. Obtain initial estimates for parameters. Iteratively use estimates for missing data and continue until convergence. Iteratively use estimates for missing data and continue until convergence.

© Prentice Hall19CSE 5331/7331 F'07 EM Example

© Prentice Hall20CSE 5331/7331 F'07 EM Algorithm

© Prentice Hall21CSE 5331/7331 F'07 Bayes Theorem Posterior Probability: P(h|x) Posterior Probability: P(h 1 |x i ) Prior Probability: P(h) Prior Probability: P(h 1 ) Bayes Theorem: Bayes Theorem: Assign probabilities of hypotheses given a data value. Assign probabilities of hypotheses given a data value.

© Prentice Hall22CSE 5331/7331 F'07 Bayes Theorem Example Credit authorizations (hypotheses): h 1 =authorize purchase, h= authorize after further identification, h=do not authorize, h= do not authorize but contact police Credit authorizations (hypotheses): h 1 =authorize purchase, h 2 = authorize after further identification, h 3 =do not authorize, h 4 = do not authorize but contact police Assign twelve data values for all combinations of credit and income: Assign twelve data values for all combinations of credit and income: From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%. From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%.

© Prentice Hall23CSE 5331/7331 F'07 Bayes Example(cont’d) Training Data: Training Data:

© Prentice Hall24CSE 5331/7331 F'07 Bayes Example(cont’d) Calculate P(x i |h j ) and P(x i ) Calculate P(x i |h j ) and P(x i ) Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i. Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i. Predict the class for x 4 : Predict the class for x 4 : –Calculate P(h j |x 4 ) for all h j. –Place x 4 in class with largest value. –Ex: »P(h 1 |x 4 )=(P(x 4 |h 1 )(P(h 1 ))/P(x 4 ) =(1/6)(0.6)/0.1=1. =(1/6)(0.6)/0.1=1. »x 4 in class h 1.

25CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

© Prentice Hall26CSE 5331/7331 F'07 Twenty Questions Game

© Prentice Hall27CSE 5331/7331 F'07 Decision Trees Decision Tree (DT): Decision Tree (DT): –Tree where the root and each internal node is labeled with a question. –The arcs represent each possible answer to the associated question. –Each leaf node represents a prediction of a solution to the problem. Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs. Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.

© Prentice Hall28CSE 5331/7331 F'07 Decision Tree Example

© Prentice Hall29CSE 5331/7331 F'07 Decision Trees How do you build a good DT? How do you build a good DT? What is a good DT? What is a good DT? Ans: Supervised Learning Ans: Supervised Learning

© Prentice Hall30CSE 5331/7331 F'07 Comparing DTs Balanced Deep

© Prentice Hall31CSE 5331/7331 F'07 Decision Tree Induction is often based on Information Theory So

© Prentice Hall32CSE 5331/7331 F'07 Information

© Prentice Hall33CSE 5331/7331 F'07 DT Induction When all the marbles in the bowl are mixed up, little information is given. When all the marbles in the bowl are mixed up, little information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. Use this approach with DT Induction !

© Prentice Hall34CSE 5331/7331 F'07 Information/Entropy Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Entropy measures the amount of randomness or surprise or uncertainty. Entropy measures the amount of randomness or surprise or uncertainty. Goal in classification Goal in classification – no surprise – entropy = 0

35CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

© Prentice Hall36CSE 5331/7331 F'07 Neural Networks Based on observed functioning of human brain. Based on observed functioning of human brain. (Artificial Neural Networks (ANN) (Artificial Neural Networks (ANN) Our view of neural networks is very simplistic. Our view of neural networks is very simplistic. We view a neural network (NN) from a graphical viewpoint. We view a neural network (NN) from a graphical viewpoint. Alternatively, a NN may be viewed from the perspective of matrices. Alternatively, a NN may be viewed from the perspective of matrices. Used in pattern recognition, speech recognition, computer vision, and classification. Used in pattern recognition, speech recognition, computer vision, and classification.

© Prentice Hall37CSE 5331/7331 F'07 Neural Networks Neural Network (NN) is a directed graph F= with vertices V={1,2,…,n} and arcs A={ |1 with vertices V={1,2,…,n} and arcs A={ |1<=i,j<=n}, with the following restrictions: –V is partitioned into a set of input nodes, V I, hidden nodes, V H, and output nodes, V O. –The vertices are also partitioned into layers –Any arc must have node i in layer h-1 and node j in layer h. –Arc is labeled with a numeric value w ij. –Node i is labeled with a function f i.

© Prentice Hall38CSE 5331/7331 F'07 Neural Network Example

© Prentice Hall39CSE 5331/7331 F'07 NN Node

© Prentice Hall40CSE 5331/7331 F'07 NN Activation Functions Functions associated with nodes in graph. Functions associated with nodes in graph. Output may be in range [-1,1] or [0,1] Output may be in range [-1,1] or [0,1]

© Prentice Hall41CSE 5331/7331 F'07 NN Activation Functions

© Prentice Hall42CSE 5331/7331 F'07 NN Learning Propagate input values through graph. Propagate input values through graph. Compare output to desired output. Compare output to desired output. Adjust weights in graph accordingly. Adjust weights in graph accordingly.