Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.

Similar presentations


Presentation on theme: "CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern."— Presentation transcript:

1 CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Some Slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. Other slides from CS 545 at Colorado State University, Chuck Anderson

2 2CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

3 3CSE 5331/7331 F'07 The slides in this introductory section are from CS545: Machine Learning By Chuck Anderson Department of Computer Science Colorado State University Fall 2006

4 4CSE 5331/7331 F'07 What is Machine Learning? Statistics ≈ the science of inference from data Machine learning ≈ multivariate statistics + computational statistics Multivariate statistics ≈ prediction of values of a function assumed to underlie a multivariate dataset Computational statistics ≈ computational methods for statistical problems (aka statistical computation) + statistical methods which happen to be computationally intensive Data Mining ≈ exploratory data analysis, particularly with massive/complex datasets

5 5CSE 5331/7331 F'07 Kinds of Learning Learning algorithms are often categorized according to the amount of information provided: Least Information: – –Unsupervised learning is more exploratory. – –Requires samples of inputs. Must find regularities. More Information: – –Reinforcement learning most recent. – –Requires samples of inputs, actions, and rewards or punishments. Most Information: – –Supervised learning is most common. – –Requires samples of inputs and desired outputs.

6 6CSE 5331/7331 F'07 Examples of Algorithms Supervised learning – –Regression » »multivariate regression » »neural networks and kernel methods – –Classification » »linear and quadratic discrimination analysis » »k-nearest neighbors » »neural networks and kernel methods Reinforcement learning – –multivariate regression – –neural networks Unsupervised learning – –principal components analysis – –k-means clustering – –self-organizing networks

7 7CSE 5331/7331 F'07

8 8

9 9

10 10CSE 5331/7331 F'07

11 11CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

12 © Prentice Hall12CSE 5331/7331 F'07 Point Estimation Point Estimate: estimate a population parameter. Point Estimate: estimate a population parameter. May be made by calculating the parameter for a sample. May be made by calculating the parameter for a sample. May be used to predict value for missing data. May be used to predict value for missing data. Ex: Ex: –R contains 100 employees –99 have salary information –Mean salary of these is $50,000 –Use $50,000 as value of remaining employee’s salary. Is this a good idea?

13 © Prentice Hall13CSE 5331/7331 F'07 Estimation Error Bias: Difference between expected value and actual value. Bias: Difference between expected value and actual value. Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Why square? Why square? Root Mean Square Error (RMSE) Root Mean Square Error (RMSE)

14 © Prentice Hall14CSE 5331/7331 F'07 Jackknife Estimate Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Ex: estimate of mean for X={x, …, x} Ex: estimate of mean for X={x 1, …, x n }

15 © Prentice Hall15CSE 5331/7331 F'07 Maximum Likelihood Estimate (MLE) Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Maximize L. Maximize L.

16 © Prentice Hall16CSE 5331/7331 F'07 MLE Example Coin toss five times: {H,H,H,H,T} Coin toss five times: {H,H,H,H,T} Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: However if the probability of a H is 0.8 then: However if the probability of a H is 0.8 then:

17 © Prentice Hall17CSE 5331/7331 F'07 MLE Example (cont’d) General likelihood formula: General likelihood formula: Estimate for p is then 4/5 = 0.8 Estimate for p is then 4/5 = 0.8

18 © Prentice Hall18CSE 5331/7331 F'07 Expectation-Maximization (EM) Solves estimation with incomplete data. Solves estimation with incomplete data. Obtain initial estimates for parameters. Obtain initial estimates for parameters. Iteratively use estimates for missing data and continue until convergence. Iteratively use estimates for missing data and continue until convergence.

19 © Prentice Hall19CSE 5331/7331 F'07 EM Example

20 © Prentice Hall20CSE 5331/7331 F'07 EM Algorithm

21 © Prentice Hall21CSE 5331/7331 F'07 Bayes Theorem Posterior Probability: P(h|x) Posterior Probability: P(h 1 |x i ) Prior Probability: P(h) Prior Probability: P(h 1 ) Bayes Theorem: Bayes Theorem: Assign probabilities of hypotheses given a data value. Assign probabilities of hypotheses given a data value.

22 © Prentice Hall22CSE 5331/7331 F'07 Bayes Theorem Example Credit authorizations (hypotheses): h 1 =authorize purchase, h= authorize after further identification, h=do not authorize, h= do not authorize but contact police Credit authorizations (hypotheses): h 1 =authorize purchase, h 2 = authorize after further identification, h 3 =do not authorize, h 4 = do not authorize but contact police Assign twelve data values for all combinations of credit and income: Assign twelve data values for all combinations of credit and income: From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%. From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%.

23 © Prentice Hall23CSE 5331/7331 F'07 Bayes Example(cont’d) Training Data: Training Data:

24 © Prentice Hall24CSE 5331/7331 F'07 Bayes Example(cont’d) Calculate P(x i |h j ) and P(x i ) Calculate P(x i |h j ) and P(x i ) Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i. Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i. Predict the class for x 4 : Predict the class for x 4 : –Calculate P(h j |x 4 ) for all h j. –Place x 4 in class with largest value. –Ex: »P(h 1 |x 4 )=(P(x 4 |h 1 )(P(h 1 ))/P(x 4 ) =(1/6)(0.6)/0.1=1. =(1/6)(0.6)/0.1=1. »x 4 in class h 1.

25 25CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

26 © Prentice Hall26CSE 5331/7331 F'07 Twenty Questions Game

27 © Prentice Hall27CSE 5331/7331 F'07 Decision Trees Decision Tree (DT): Decision Tree (DT): –Tree where the root and each internal node is labeled with a question. –The arcs represent each possible answer to the associated question. –Each leaf node represents a prediction of a solution to the problem. Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs. Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.

28 © Prentice Hall28CSE 5331/7331 F'07 Decision Tree Example

29 © Prentice Hall29CSE 5331/7331 F'07 Decision Trees How do you build a good DT? How do you build a good DT? What is a good DT? What is a good DT? Ans: Supervised Learning Ans: Supervised Learning

30 © Prentice Hall30CSE 5331/7331 F'07 Comparing DTs Balanced Deep

31 © Prentice Hall31CSE 5331/7331 F'07 Decision Tree Induction is often based on Information Theory So

32 © Prentice Hall32CSE 5331/7331 F'07 Information

33 © Prentice Hall33CSE 5331/7331 F'07 DT Induction When all the marbles in the bowl are mixed up, little information is given. When all the marbles in the bowl are mixed up, little information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. Use this approach with DT Induction !

34 © Prentice Hall34CSE 5331/7331 F'07 Information/Entropy Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Entropy measures the amount of randomness or surprise or uncertainty. Entropy measures the amount of randomness or surprise or uncertainty. Goal in classification Goal in classification – no surprise – entropy = 0

35 35CSE 5331/7331 F'07 Table of Contents Introduction (Chuck Anderson) Introduction (Chuck Anderson) Statistical Machine Learning Examples Statistical Machine Learning Examples –Estimation –EM –Bayes Theorem Decision Tree Learning Decision Tree Learning Neural Network Learning Neural Network Learning

36 © Prentice Hall36CSE 5331/7331 F'07 Neural Networks Based on observed functioning of human brain. Based on observed functioning of human brain. (Artificial Neural Networks (ANN) (Artificial Neural Networks (ANN) Our view of neural networks is very simplistic. Our view of neural networks is very simplistic. We view a neural network (NN) from a graphical viewpoint. We view a neural network (NN) from a graphical viewpoint. Alternatively, a NN may be viewed from the perspective of matrices. Alternatively, a NN may be viewed from the perspective of matrices. Used in pattern recognition, speech recognition, computer vision, and classification. Used in pattern recognition, speech recognition, computer vision, and classification.

37 © Prentice Hall37CSE 5331/7331 F'07 Neural Networks Neural Network (NN) is a directed graph F= with vertices V={1,2,…,n} and arcs A={ |1 with vertices V={1,2,…,n} and arcs A={ |1<=i,j<=n}, with the following restrictions: –V is partitioned into a set of input nodes, V I, hidden nodes, V H, and output nodes, V O. –The vertices are also partitioned into layers –Any arc must have node i in layer h-1 and node j in layer h. –Arc is labeled with a numeric value w ij. –Node i is labeled with a function f i.

38 © Prentice Hall38CSE 5331/7331 F'07 Neural Network Example

39 © Prentice Hall39CSE 5331/7331 F'07 NN Node

40 © Prentice Hall40CSE 5331/7331 F'07 NN Activation Functions Functions associated with nodes in graph. Functions associated with nodes in graph. Output may be in range [-1,1] or [0,1] Output may be in range [-1,1] or [0,1]

41 © Prentice Hall41CSE 5331/7331 F'07 NN Activation Functions

42 © Prentice Hall42CSE 5331/7331 F'07 NN Learning Propagate input values through graph. Propagate input values through graph. Compare output to desired output. Compare output to desired output. Adjust weights in graph accordingly. Adjust weights in graph accordingly.


Download ppt "CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern."

Similar presentations


Ads by Google