Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Similar presentations


Presentation on theme: "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."— Presentation transcript:

1 Mehdi Ghayoumi MSB rm 132 mghayoum@kent.edu Ofc hr: Thur, 11-12 a Machine Learning

2 Overfitting: Model is too “complex” and fits irrelevant characteristics (noise) in the data –Low bias and high variance –Low training error and high test error Machine Learning

3 Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Machine Learning

4 Bias-Variance Trade-off Machine Learning

5 Bias-Variance Trade-off E(MSE) = noise 2 + bias 2 + variance Unavoidable error Error due to incorrect assumptions Error due to variance of training samples Machine Learning

6

7

8

9 Probabilities We write P(A) as “the fraction of possible worlds in which A is true” Worlds in which A is true Worlds in which A is false Event space of all possible worlds P(A) = Area of red rectangle Its area is 1 Machine Learning

10 Axioms of Probability Theory 1.All probabilities between 0 and 1 0<= P(A) <= 1 2.True has probability 1, false has probability 0. P(true) = 1 P(false) = 0 P(not A) = P(~A) = 1-P(A) 3.The probability of disjunction is: P( A or B) = P(A) + P(B) – P (A and B) Sometimes it is written as this : Machine Learning

11 Interpretation of the Axioms A B A or B B A and B Simple addition and subtraction Machine Learning

12 Definition of Conditional Probability P(A ^ B) P(A|B) = ------------ P(B) The Chain Rule: P(A ^ B) = P(A|B) P(B) Machine Learning

13 Conditional Probability P(A|B) = Fraction of worlds in which B is true that also have A true H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = ½ F H P(H|F) = Fraction of flu-inflicted worlds in which you have a Headache = #worlds with flu and headache ---------------------------------- #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F) Machine Learning

14 Probabilistic Inference H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = ½ F H A B C Area wise we have: P(F)= P(H)= P(H|F)= P(F|H)= Machine Learning

15 Independence Machine Learning 2 blue and 3 red marbles are in a bag. What are the chances of getting a blue marble? The chance is 2 in 5 But after taking one out the chances change! So the next time?

16 Independence A and B are independent iff: Therefore, if A and B are independent: Machine Learning

17 Example: Ice Cream 70% of your friends like Chocolate, and 35% like Chocolate AND like Strawberry. What percent of those who like Chocolate also like Strawberry?

18 Machine Learning P(Strawberry | Chocolate) = P(Chocolate and Strawberry) / P(Chocolate) 0.35 / 0.7 = 50% It means: 50% of your friends who like Chocolate also like Strawberry

19 Machine Learning

20 Th e joint probability distribution for a set of random variables, X 1,…,X n gives the probability of every combination of values (an n-dimensional array with v n values if all variables are discrete with v values, all v n values must sum to 1): P(X 1,…,X n ) circlesquare red0.200.02 blue0.020.01 circlesquare red0.050.30 blue0.20 positivenegative Machine Learning

21 The probability of all possible conjunctions (assignments of values to some subset of variables) can be calculated by summing the appropriate subset of values from the joint distribution. Therefore, all conditional probabilities can also be calculated. Machine Learning

22 Bayes Rule Thomas Bayes (c. 1701 – 7 April 1761) was an English statistician, philosopher and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would eventually become his most famous accomplishment; his notes were edited and published after his death by Richard Price. Machine Learning

23 Bayesian Learning Machine Learning

24 An Illustrating Example A patient takes a lab test and the result comes back positive. It is known that the test returns a correct positive result in only 98% of the cases and a correct negative result in only 97% of the cases. Furthermore, only 0.008 of the entire population has this disease. 1. What is the probability that this patient has cancer? 2. What is the probability that he does not have cancer? 3. What is the disease? Machine Learning

25 An Illustrating Example The available data has two possible outcomes: Positive (+) and Negative (-) Various probabilities are P(cancer) = 0.008P(~cancer) = 0.992 P(+|cancer) = 0.98P(-|cancer) = 0.02 P(+|~cancer) = 0.03P(-|~cancer) 0.97 Now a new patient, whose test result is positive, Should we diagnose the patient have cancer or not? Machine Learning

26 Choosing Hypotheses Generally, we want the most probable hypothesis given the observed data: –Maximum a posteriori (MAP) hypothesis –Maximum likelihood (ML) hypothesis Machine Learning Definition: Arg max stands for the argument of the maximum, that is to say, the set of points of the given argument for which the given function attains its maximum value.

27 Maximum a posteriori (MAP) Maximum a posteriori (MAP) hypothesis Note P(x) is independent of h, hence can be ignored. Machine Learning

28 Does patient have cancer or not? P(+|cancer)P(cancer) =0.98*0.008 = 0.078 P(+|~cancer)P(~cancer) = 0.03*0.992 = 0.298 MAP: P(+|cancer)P(cancer) < P(+|~cancer)P(~cancer) Diagnosis: ~cancer Machine Learning

29 Thank you!


Download ppt "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."

Similar presentations


Ads by Google