2-9-061 Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Probabilistic models Haixu Tang School of Informatics.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Maximum likelihood (ML)
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
INTRODUCTION TO Machine Learning 3rd Edition
Confidence Interval & Unbiased Estimator Review and Foreword.
Simple examples of the Bayesian approach For proportions and means.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
> FDRpvalue FDRTPvalue plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Canadian Bioinformatics Workshops
Bayesian Estimation and Confidence Intervals Lecture XXII.
Applied statistics Usman Roshan.
Bayesian Estimation and Confidence Intervals
ESTIMATION.
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
More about Posterior Distributions
Bayesian Inference, Basics
Statistical NLP: Lecture 4
LECTURE 07: BAYESIAN ESTIMATION
Bayes for Beginners Luca Chech and Jolanda Malamud
Parametric Methods Berlin Chen, 2005 References:
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates of the model parameters  and  2 are numbers that maximize the joint pdf for the fixed sample which is called the Likelihood function Likelihood function is basically the pdf for the fixed sample

Sampling Distributions x 1,x 2,....,x n ~ iid N( ,  2 ) "Sample Statistics" is a numerical summary of a random sample (e.g. ). As functions of random variables, they are also random variables. "Sampling Distribution" is the probability distribution (statistical model) of "Sample Statistics" - can be derived from the probability distribution of experimental outcomes

"Frequentist" inference Assume that parameters in the model describing the probability of experimental outcome are unknown, but fixed values Given a random sample of experimental outcome (data), we make inference (i.e. make probabilistic statements) about the values of the underlying parameters based on the sampling distributions of parameter estimates and other "sample statistics" Since model parameters are not random variables, these statements are somewhat contrived. For example we don't talk about the p(  >0), but about p(t>t * |  =0). However, for simple situations this works just fine and arguments are mostly philosophical

Bayesian Inference Assumes parameters are random variables - key difference Inference based on the posterior distribution given data Prior Distribution Defines prior knowledge or ignorance about the parameter Posterior Distribution Prior belief modified by data

Bayesian Inference Prior distribution of  Data model given  Posterior distribution of  given data (Bayes theorem) P(  >0|data)

Bayesian Estimation Bayesian point-estimate is the expected value of the parameter under its posterior distribution given data In some cases, the expectation of the posterior distribution could be difficult to assess - easer to find the value for the parameter that maximized the posterior distribution given data - Maximum a Posteriori (MAP) estimate Since the numerator of the posterior distribution in the Bayes theorem is constant in the parameter, this is equivalent to maximizing the product of the likelihood and the prior pdf

Alternative prior for the normal model Degenerate uniform prior for  assuming that any prior value is equally likely - this is clearly unrealistic - we know more than that MAP estimate for  is identical to the maximum likelihood estimate Bayesian point-estimation and maximum likelihood are very closely related

Hierarchical Bayesian Models and Empirical Bayes Inference MOTIVATION x ij ~ ind N(  j,  j 2 ), i=1,...,n is number of replicated observations and j=1,...,T is indexing all genes Each gene has its own mean and variance Usually n is small in comparison to T Want to use information from all genes to estimate the variance of individual gene measurements

Hierarchical Bayesian Models and Empirical Bayes Inference SOLUTION Postulate the "hierarchical" Bayesian model in which individual variances for different genes are assumed to be generated by a single distributions Estimate the parameters of this distribution using the Empirical Bayes approach Estimate individual gene's variances using Bayesian estimation assuming the prior parameters calculated using Empirical Bayes

Hierarchical Bayesian Models and Empirical Bayes Inference Testing the hypothesis  i =0, by calculating the modified t-statistics Limma operates on linear models y j = X  j +  j,  1j,...,  nj ~ N(0,  j 2 ) and the Empirical Bayes estimation is applied to estimate  2 for each gene

Effects of using Empirical Bayes modifications > attributes(FitLMAD) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" "cov.coefficients" [6] "pivot" "method" "design" "genes" "Amean" $class [1] "MArrayLM" attr(,"package") [1] "limma" > attributes(EFitLMAD) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" "cov.coefficients" [6] "pivot" "method" "design" "genes" "Amean" [11] "df.prior" "s2.prior" "var.prior" "proportion" "s2.post" [16] "t" "p.value" "lods" "F" "F.p.value" $class [1] "MArrayLM" attr(,"package") [1] "limma"

Effects of using Empirical Bayes modifications > EFitLMAD$s2.prior [1] > EFitLMAD$df.prior [1]

Effects of using Empirical Bayes modifications > AnovadB$s2.prior [1] > AnovadB$df.prior [1] Empirical Bayes "inflates variances" from the low-variability genes This reduces the proportion of "false positive" resulting from the low variance It biases chance of being differentially expressed towards genes with higher observed differential expressions It has been shown to overall improve the proportion of true positives among the genes pronounced significant "Stein effect" - individually we can not improve over the simple t-test, but by looking at all genes at the same time, turns out that this method works better

Effects of using Empirical Bayes modifications > AnovadB$s2.prior [1] > AnovadB$df.prior [1]

Effects of using Empirical Bayes modifications > AnovadB$s2.prior [1] > AnovadB$df.prior [1]