Machine Learning Saarland University, SS 2007 Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 10, Friday June 22 nd, 2007 (Everything you always wanted to know about statistics … but were afraid to ask)
Overview of this lecture Maximum likelihood vs. unbiased estimators –Example: normal distribution –Example: drawing numbers from a box Things you keep on reading in the ML literature [example]example –marginal distribution –prior –posterior Statistical tests –hypothesis testing –discussion of its (non)sense
Maximum likelihood vs. unbiased estimators Example: maximum likelihood estimator from Lecture 8, Example 2 –μ(x 1,…,x n ) = 1/n ∙ Σ i x i σ 2 (x 1,…,x n ) = 1/n ∙ Σ i (x i – μ) 2 –X 1,…,X n independent identically distributed random variables with mean μ and variance σ 2 –E μ(X 1,…,X n ) = μ [blackboard] –E σ 2 (X 1,…,X n ) = (n–1) / n ∙ σ 2 ≠ σ 2 [blackboard] –unbiased variance estimator = 1 / (n-1) ∙ Σ i (x i – μ) 2 Example: number x drawn from box with numbers 1..n for unknown n –maximum likelihood estimator: n = x [blackboard] –unbiased estimator: n = 2x – 1 [blackboard]
Marginal distribution Joint probability distribution, for example –pick a random MPII staff member –random variables X = department, Y = gender –for example, Pr(X = D3, Y = female) D1 D2 D3D4D5 male female Pr(D3) Pr(female) Note: –matrix entries sum to 1 –in general, Pr(X = x, Y = y) ≠ Pr(X = x) ∙ Pr(Y = y) [holds if and only if X and Y are independent]
Frequentism vs. Bayesianism Frequentism –probability = relative frequency in large number of trials –associated with random (physical) system –only applied to well-defined events in well-defined space for example: probability of a die showing 6 Bayesianism –probability = degree of belief –no random process at all needs to be involved –applied to arbitrary statements for example: probability that I will like a new movie
Prior / Posterior probability Prior –guess about the data, no random experiment behind –go on computing with the guess like with a probability –for example: Z 1,…,Z n from E-Step of EM algorithm Posterior –probability related to an event that has already happened –for example: all our likelihoods from Lectures 8 and 9 Note: these are no well-defined technical terms –but often used as if, which is confusing –the Bayesianism way …
Hypothesis testing Example: do two samples have the same mean? –e.g., two groups of patients in a medical experiment, one group with medication and one group without –for example, and Test –Formulate null hypothesis, e.g. equal means –compute probability p of the given (or more extreme) data, assuming that the null hypothesis is true [blackboard] Outcome –p ≤ α = 0.05 hypothesis rejected with significance level 95% one says: the difference of the means is statistically significant –p > α = 0.05 the hypothesis cannot be rejected one says: the difference of the means is statistically insignificant
Hypothesis testing — BEWARE! What one would ideally like: –given this data, what is the probability that my hypothesis if true? –formally: Pr(H | D) What one gets from hypothesis testing –given that my hypothesis is true, what is the probability of this (or more extreme) data –formally: Pr(D | H) –but Pr(D | H) could be low for other reasons than the hypothesis!! [blackboard example] Useful at all? –OK: challenge theory by attempting to reject it –NO: confirm theory by rejecting corresponding null hypothesis
Literature Read the wonderful articles by Jacob Cohen –Things I have learned (so far) American Psychologist, 45(12):1304–1312, 1990Things I have learned (so far) –The earth is round (p <.05) American Psychologist 49(12):997–1003, 1994The earth is round (p <.05)