Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Mehdi Ghayoumi MSB rm 132 mghayoum@kent.edu Ofc hr: Thur, 11-12 a Machine Learning

Overfitting: Model is too “complex” and fits irrelevant characteristics (noise) in the data –Low bias and high variance –Low training error and high test error Machine Learning

Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Machine Learning

Bias-Variance Trade-off Machine Learning

Bias-Variance Trade-off E(MSE) = noise 2 + bias 2 + variance Unavoidable error Error due to incorrect assumptions Error due to variance of training samples Machine Learning

Probabilities We write P(A) as “the fraction of possible worlds in which A is true” Worlds in which A is true Worlds in which A is false Event space of all possible worlds P(A) = Area of red rectangle Its area is 1 Machine Learning

Axioms of Probability Theory 1.All probabilities between 0 and 1 0<= P(A) <= 1 2.True has probability 1, false has probability 0. P(true) = 1 P(false) = 0 P(not A) = P(~A) = 1-P(A) 3.The probability of disjunction is: P( A or B) = P(A) + P(B) – P (A and B) Sometimes it is written as this : Machine Learning

Interpretation of the Axioms A B A or B B A and B Simple addition and subtraction Machine Learning

Definition of Conditional Probability P(A ^ B) P(A|B) = ------------ P(B) The Chain Rule: P(A ^ B) = P(A|B) P(B) Machine Learning

Conditional Probability P(A|B) = Fraction of worlds in which B is true that also have A true H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = ½ F H P(H|F) = Fraction of flu-inflicted worlds in which you have a Headache = #worlds with flu and headache ---------------------------------- #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F) Machine Learning

Probabilistic Inference H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = ½ F H A B C Area wise we have: P(F)= P(H)= P(H|F)= P(F|H)= Machine Learning

Independence Machine Learning 2 blue and 3 red marbles are in a bag. What are the chances of getting a blue marble? The chance is 2 in 5 But after taking one out the chances change! So the next time?

Independence A and B are independent iff: Therefore, if A and B are independent: Machine Learning

Example: Ice Cream 70% of your friends like Chocolate, and 35% like Chocolate AND like Strawberry. What percent of those who like Chocolate also like Strawberry?

Machine Learning P(Strawberry | Chocolate) = P(Chocolate and Strawberry) / P(Chocolate) 0.35 / 0.7 = 50% It means: 50% of your friends who like Chocolate also like Strawberry

Machine Learning

Th e joint probability distribution for a set of random variables, X 1,…,X n gives the probability of every combination of values (an n-dimensional array with v n values if all variables are discrete with v values, all v n values must sum to 1): P(X 1,…,X n ) circlesquare red0.200.02 blue0.020.01 circlesquare red0.050.30 blue0.20 positivenegative Machine Learning

The probability of all possible conjunctions (assignments of values to some subset of variables) can be calculated by summing the appropriate subset of values from the joint distribution. Therefore, all conditional probabilities can also be calculated. Machine Learning

Bayes Rule Thomas Bayes (c. 1701 – 7 April 1761) was an English statistician, philosopher and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would eventually become his most famous accomplishment; his notes were edited and published after his death by Richard Price. Machine Learning

Bayesian Learning Machine Learning

An Illustrating Example A patient takes a lab test and the result comes back positive. It is known that the test returns a correct positive result in only 98% of the cases and a correct negative result in only 97% of the cases. Furthermore, only 0.008 of the entire population has this disease. 1. What is the probability that this patient has cancer? 2. What is the probability that he does not have cancer? 3. What is the disease? Machine Learning

An Illustrating Example The available data has two possible outcomes: Positive (+) and Negative (-) Various probabilities are P(cancer) = 0.008P(~cancer) = 0.992 P(+|cancer) = 0.98P(-|cancer) = 0.02 P(+|~cancer) = 0.03P(-|~cancer) 0.97 Now a new patient, whose test result is positive, Should we diagnose the patient have cancer or not? Machine Learning

Choosing Hypotheses Generally, we want the most probable hypothesis given the observed data: –Maximum a posteriori (MAP) hypothesis –Maximum likelihood (ML) hypothesis Machine Learning Definition: Arg max stands for the argument of the maximum, that is to say, the set of points of the given argument for which the given function attains its maximum value.

Maximum a posteriori (MAP) Maximum a posteriori (MAP) hypothesis Note P(x) is independent of h, hence can be ignored. Machine Learning

Does patient have cancer or not? P(+|cancer)P(cancer) =0.98*0.008 = 0.078 P(+|~cancer)P(~cancer) = 0.03*0.992 = 0.298 MAP: P(+|cancer)P(cancer) < P(+|~cancer)P(~cancer) Diagnosis: ~cancer Machine Learning

Thank you!

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Similar presentations

Presentation on theme: "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Similar presentations

Presentation on theme: "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback