Download presentation
Presentation is loading. Please wait.
Published byJoan McDonald Modified over 9 years ago
1
Machine Learning Saarland University, SS 2007 Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 10, Friday June 22 nd, 2007 (Everything you always wanted to know about statistics … but were afraid to ask)
2
Overview of this lecture Maximum likelihood vs. unbiased estimators –Example: normal distribution –Example: drawing numbers from a box Things you keep on reading in the ML literature [example]example –marginal distribution –prior –posterior Statistical tests –hypothesis testing –discussion of its (non)sense
3
Maximum likelihood vs. unbiased estimators Example: maximum likelihood estimator from Lecture 8, Example 2 –μ(x 1,…,x n ) = 1/n ∙ Σ i x i σ 2 (x 1,…,x n ) = 1/n ∙ Σ i (x i – μ) 2 –X 1,…,X n independent identically distributed random variables with mean μ and variance σ 2 –E μ(X 1,…,X n ) = μ [blackboard] –E σ 2 (X 1,…,X n ) = (n–1) / n ∙ σ 2 ≠ σ 2 [blackboard] –unbiased variance estimator = 1 / (n-1) ∙ Σ i (x i – μ) 2 Example: number x drawn from box with numbers 1..n for unknown n –maximum likelihood estimator: n = x [blackboard] –unbiased estimator: n = 2x – 1 [blackboard]
4
Marginal distribution Joint probability distribution, for example –pick a random MPII staff member –random variables X = department, Y = gender –for example, Pr(X = D3, Y = female) D1 D2 D3D4D5 male0.240.090.130.250.11 female0.03 0.04 0.270.120.170.290.15 0.82 0.18 Pr(D3) Pr(female) Note: –matrix entries sum to 1 –in general, Pr(X = x, Y = y) ≠ Pr(X = x) ∙ Pr(Y = y) [holds if and only if X and Y are independent]
5
Frequentism vs. Bayesianism Frequentism –probability = relative frequency in large number of trials –associated with random (physical) system –only applied to well-defined events in well-defined space for example: probability of a die showing 6 Bayesianism –probability = degree of belief –no random process at all needs to be involved –applied to arbitrary statements for example: probability that I will like a new movie
6
Prior / Posterior probability Prior –guess about the data, no random experiment behind –go on computing with the guess like with a probability –for example: Z 1,…,Z n from E-Step of EM algorithm Posterior –probability related to an event that has already happened –for example: all our likelihoods from Lectures 8 and 9 Note: these are no well-defined technical terms –but often used as if, which is confusing –the Bayesianism way …
7
Hypothesis testing Example: do two samples have the same mean? –e.g., two groups of patients in a medical experiment, one group with medication and one group without –for example, 8.6 4.3 3.2 5.1 and 2.1 4.2 7.6 3.2 2.9 Test –Formulate null hypothesis, e.g. equal means –compute probability p of the given (or more extreme) data, assuming that the null hypothesis is true [blackboard] Outcome –p ≤ α = 0.05 hypothesis rejected with significance level 95% one says: the difference of the means is statistically significant –p > α = 0.05 the hypothesis cannot be rejected one says: the difference of the means is statistically insignificant
8
Hypothesis testing — BEWARE! What one would ideally like: –given this data, what is the probability that my hypothesis if true? –formally: Pr(H | D) What one gets from hypothesis testing –given that my hypothesis is true, what is the probability of this (or more extreme) data –formally: Pr(D | H) –but Pr(D | H) could be low for other reasons than the hypothesis!! [blackboard example] Useful at all? –OK: challenge theory by attempting to reject it –NO: confirm theory by rejecting corresponding null hypothesis
9
Literature Read the wonderful articles by Jacob Cohen –Things I have learned (so far) American Psychologist, 45(12):1304–1312, 1990Things I have learned (so far) –The earth is round (p <.05) American Psychologist 49(12):997–1003, 1994The earth is round (p <.05)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.