Machine Learning Saarland University, SS 2007 Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Probabilistic models Haixu Tang School of Informatics.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Sampling: Final and Initial Sample Size Determination
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Maximum likelihood (ML) and likelihood ratio (LR) test
1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Introduction to Econometrics The Statistical Analysis of Economic (and related) Data.
What z-scores represent
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13,
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Inference.ppt - © Aki Taanila1 Sampling Probability sample Non probability sample Statistical inference Sampling error.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.
1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
Machine Learning Queens College Lecture 3: Probability and Statistics.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Random Sampling, Point Estimation and Maximum Likelihood.
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Individual values of X Frequency How many individuals   Distribution of a population.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
ANOVA Assumptions 1.Normality (sampling distribution of the mean) 2.Homogeneity of Variance 3.Independence of Observations - reason for random assignment.
Making sense of randomness
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Remember Playing perfect black jack – the probability of winning a hand is.498 What is the probability that you will win 8 of the next 10 games of blackjack?
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Review Statistical inference and test of significance.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Applied statistics Usman Roshan.
Bayesian Estimation and Confidence Intervals
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Discrete Event Simulation - 4
Statistical NLP: Lecture 4
STAT Z-Tests and Confidence Intervals for a
CHAPTER 6 Statistical Inference & Hypothesis Testing
Discrete Random Variables: Joint PMFs, Conditioning and Independence
Mathematical Foundations of BME Reza Shadmehr
Statistical Inference for the Mean: t-test
Presentation transcript:

Machine Learning Saarland University, SS 2007 Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 10, Friday June 22 nd, 2007 (Everything you always wanted to know about statistics … but were afraid to ask)

Overview of this lecture Maximum likelihood vs. unbiased estimators –Example: normal distribution –Example: drawing numbers from a box Things you keep on reading in the ML literature [example]example –marginal distribution –prior –posterior Statistical tests –hypothesis testing –discussion of its (non)sense

Maximum likelihood vs. unbiased estimators Example: maximum likelihood estimator from Lecture 8, Example 2 –μ(x 1,…,x n ) = 1/n ∙ Σ i x i σ 2 (x 1,…,x n ) = 1/n ∙ Σ i (x i – μ) 2 –X 1,…,X n independent identically distributed random variables with mean μ and variance σ 2 –E μ(X 1,…,X n ) = μ [blackboard] –E σ 2 (X 1,…,X n ) = (n–1) / n ∙ σ 2 ≠ σ 2 [blackboard] –unbiased variance estimator = 1 / (n-1) ∙ Σ i (x i – μ) 2 Example: number x drawn from box with numbers 1..n for unknown n –maximum likelihood estimator: n = x [blackboard] –unbiased estimator: n = 2x – 1 [blackboard]

Marginal distribution Joint probability distribution, for example –pick a random MPII staff member –random variables X = department, Y = gender –for example, Pr(X = D3, Y = female) D1 D2 D3D4D5 male female Pr(D3) Pr(female) Note: –matrix entries sum to 1 –in general, Pr(X = x, Y = y) ≠ Pr(X = x) ∙ Pr(Y = y) [holds if and only if X and Y are independent]

Frequentism vs. Bayesianism Frequentism –probability = relative frequency in large number of trials –associated with random (physical) system –only applied to well-defined events in well-defined space for example: probability of a die showing 6 Bayesianism –probability = degree of belief –no random process at all needs to be involved –applied to arbitrary statements for example: probability that I will like a new movie

Prior / Posterior probability Prior –guess about the data, no random experiment behind –go on computing with the guess like with a probability –for example: Z 1,…,Z n from E-Step of EM algorithm Posterior –probability related to an event that has already happened –for example: all our likelihoods from Lectures 8 and 9 Note: these are no well-defined technical terms –but often used as if, which is confusing –the Bayesianism way …

Hypothesis testing Example: do two samples have the same mean? –e.g., two groups of patients in a medical experiment, one group with medication and one group without –for example, and Test –Formulate null hypothesis, e.g. equal means –compute probability p of the given (or more extreme) data, assuming that the null hypothesis is true [blackboard] Outcome –p ≤ α = 0.05  hypothesis rejected with significance level 95% one says: the difference of the means is statistically significant –p > α = 0.05  the hypothesis cannot be rejected one says: the difference of the means is statistically insignificant

Hypothesis testing — BEWARE! What one would ideally like: –given this data, what is the probability that my hypothesis if true? –formally: Pr(H | D) What one gets from hypothesis testing –given that my hypothesis is true, what is the probability of this (or more extreme) data –formally: Pr(D | H) –but Pr(D | H) could be low for other reasons than the hypothesis!! [blackboard example] Useful at all? –OK: challenge theory by attempting to reject it –NO: confirm theory by rejecting corresponding null hypothesis

Literature Read the wonderful articles by Jacob Cohen –Things I have learned (so far) American Psychologist, 45(12):1304–1312, 1990Things I have learned (so far) –The earth is round (p <.05) American Psychologist 49(12):997–1003, 1994The earth is round (p <.05)