Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Statistics for Particle Physics: Intervals Roger Barlow Karlsruhe: 12 October 2009.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
Computer vision: models, learning and inference
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
PATTERN RECOGNITION AND MACHINE LEARNING
Chi-squared distribution  2 N N = number of degrees of freedom Computed using incomplete gamma function: Moments of  2 distribution:
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Model Inference and Averaging
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
Lecture 2: Statistical learning primer for biologists
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 5 Introduction to Sampling Distributions.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Inference: Multiple Parameters
Data Modeling Patrice Koehl Department of Biological Sciences
Stat 223 Introduction to the Theory of Statistics
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
Stat 223 Introduction to the Theory of Statistics
Probability Theory and Parameter Estimation I
STAT 311 REVIEW (Quick & Dirty)
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
POISSON TALES FISH TALES
Special Topics In Scientific Computing
More about Posterior Distributions
Computing and Statistical Data Analysis / Stat 8
Stat 223 Introduction to the Theory of Statistics
LECTURE 09: BAYESIAN LEARNING
Applied Statistics and Probability for Engineers
Presentation transcript:

Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased estimate of the error on a single point? We can’t just take the square root! This would introduce a bias: S2S2 sqrt(S 2 ) 22 Y=sqrt(X) 

Mean and variance of S 2 Like any other statistic, S 2 has its own mean and variance. Need to know these to compute bias in S:

Bias in sqrt(S 2 ) Define square root function g: g(X) and its derivatives: Hence compute bias:

Unbiased estimator for  Re-define bias-corrected estimator for  :

Conditional probabilities Consider 2 random variables X and Y with a joint p.d.f. P(X,Y) that looks like: To get P(X) or P(Y), project P(X,Y) on to X or Y axis and normalise. Can also determine P(X|Y) (“probability of X given Y”) which is a normalised slice through P(X,Y) at a fixed value of Y or vice versa. At any point along each slice, can get P(X,Y) from:

Bayes’ Theorem and Bayesian inference Bayes’ Theorem: This leads to the method of Bayesian inference: We can determine the evidence P(data|model) using goodness-of-fit statistics. We can often determine P(model) using prior knowledge about the models. This allows us to make inferences about the relative probabilities of different models, given the data.

Choice of prior Suppose our model of a set of data X is controlled by a parameter . Our knowledge about  before X is measured is quantified by the prior p.d.f. P(  ). Choice of P(  ) is arbitrary subject to common sense! After measuring X get posterior p.d.f. P(  |X) = P(X|  ).P(  ) Different priors P(  ) lead to different inferences P(  |X)! X  P(X|a) P(a)~1 / Log (a) P(a|X) Uniform P(a) P(a|X)

Examples Suppose  is the Doppler shift of a star. Adopting a search range –200 <  < 200 km/sec in uniform velocity increments implicitly assumes a uniform prior. Alternatively: scaling an emission-line profile of known shape. If you know  ≥ 0, can force  > 0 by constructing the pdf in uniform increments of Log  so P(  ) ~ 1/Log(  ). Posterior distributions are skewed differently according to choice of prior. X  P(X|a) P(  )~1 / Log (  ) P(  |X) Uniform P(  ) P(  |X)

Relative probabilities of models Two models m1, m2 Relative probabilities depend on –Ratio of prior probabilities –Relative ability to fit data Note that P(data) cancels.

Maximum likelihood fits Suppose we try to fit a spectral line + continuum using a set of data points X i, i=1...N Suppose our model is: Parameters are C, A, 0,  i,  i assumed known.

Likelihood of a model Likelihood of a particular set  of model parameters (i.e. probability of getting this set of data given model  ), is: If errors are gaussian, then:

Estimating  Data points X i with no errors: To find A, minimise  2  2. Can’t use  2 minimisation to estimate  because : Instead, minimise i XiXi   2N ln  -2 ln L 