Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Chapter 3 Properties of Random Variables
The Simple Regression Model
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Integration of sensory modalities
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
Maximum likelihood (ML) and likelihood ratio (LR) test
Lecture 3: Resemblance Between Relatives. Heritability Central concept in quantitative genetics Proportion of variation due to additive genetic values.
Chapter 10 Simple Regression.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Maximum likelihood (ML) and likelihood ratio (LR) test
The Simple Regression Model
Visual Recognition Tutorial
The Simple Regression Model
Review of Probability and Statistics
Maximum likelihood (ML)
Simple Linear Regression and Correlation
Random Variable and Probability Distribution
Modern Navigation Thomas Herring
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Lecture 5 Correlation and Regression
Chapter 4-5: Analytical Solutions to OLS
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Variance and Covariance
Two Random Variables W&W, Chapter 5. Joint Distributions So far we have been talking about the probability of a single variable, or a variable conditional.
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 G Lect 4a G Lecture 4a f(X) of special interest: Normal Distribution Are These Random Variables Normally Distributed? Probability Statements.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
1 G Lect 2M Examples of Correlation Random variables and manipulated variables Thinking about joint distributions Thinking about marginal distributions:
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Correlation & Regression Analysis
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Intro to Probability Slides from Professor Pan,Yan, SYSU.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Inference about the slope parameter and correlation
Probability Theory and Parameter Estimation I
Model Inference and Averaging
Ch3: Model Building through Regression
Correlation and Simple Linear Regression
Review of Probability and Estimators Arun Das, Jason Rebello
Inference about the Slope and Intercept
More about Posterior Distributions
Inference about the Slope and Intercept
EC 331 The Theory of and applications of Maximum Likelihood Method
Correlation and Simple Linear Regression
Simple Linear Regression
Simple Linear Regression and Correlation
Parametric Methods Berlin Chen, 2005 References:
Lecture 3: Resemblance Between Relatives
Applied Statistics and Probability for Engineers
Presentation transcript:

Lecture 1: Basic Statistical Tools

A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution A discrete RV x --- takes on values X 1, X 2, … X k Probability distribution: P i = Pr(x = X i ) P i > 0,  P i = 1 A continuous RV x can take on any possible value in some interval (or set of intervals) < < The probability distribution is defined by the probability density function, p(x) > Discrete and Continuous Random Variables

Joint and Conditional Probabilities The probability for a pair (x,y) of random variables is specified by the joint probability density function, p(x,y) p(y|x), the conditional density y given x < < < < The marginal density of x, p(x) < < Relationships among p(x), p(x,y), p(y|x) x and y are said to be independent if p(x,y) = p(x)p(y) Note that p(y|x) = p(y) if x and y are independent

Bayes’ Theorem Suppose an unobservable RV takes on values b 1.. b n Suppose that we observe the outcome A of an RV correlated with b. What can we say about b given A? Bayes’ theorem: A typical application in genetics is that A is some phenotype and b indexes some underlying (but unknown) genotype

GenotypeQQQqqq Freq(genotype) Pr(height >70 | genotype) Pr(height > 70) = 0.3* * *0.2 = 0.51 Pr(QQ | height > 70) = Pr(QQ) * Pr (height > 70 | QQ) Pr(height > 70) = 0.5*0.3 / 0.51 = 0.294

Expectations of Random Variables The expected value, E [f(x)], of some function x of the random variable x is just the average value of that function E[x] = the (arithmetic) mean, , of a random variable x E[ (x -  ) 2 ] =  2, the variance of x More generally, the rth moment about the mean is given by E[ (x -  ) r ] r =2: variance r =3: skew r = 4: (scaled) kurtosis Useful properties of expectations x continuous - x discrete [ ]

The Normal (or Gaussian) Distribution A normal RV with mean  and variance  2 has density: ( ) Mean (  ) = peak of distribution The variance is a measure of spread about the mean. The smaller  2, the narrower the distribution about the mean

The truncated normal Only consider values of T or above in a normal T Density function = p(x | x > T) Let  T = Pr(z > T) Mean of truncated distribution Variance Here p T is the height of the normal at the truncation point,

Covariances Cov(x,y) = E [(x-  x )(y-  y )] Cov(x,y) > 0, positive (linear) association between x & yCov(x,y) < 0, negative (linear) association between x & y Cov(x,y) = 0, no linear association between x & yCov(x,y) = 0 DOES NOT imply no association = E [x*y] - E[x]*E[y]

Correlation Cov = 10 tells us nothing about the strength of an association What is needed is an absolute measure of association This is provided by the correlation, r(x,y) r = 1 implies a perfect (positive) linear association r = - 1 implies a perfect (negative) linear association

Useful Properties of Variances and Covariances Symmetry, Cov(x,y) = Cov(y,x) The covariance of a variable with itself is the variance, Cov(x,x) = Var(x) If a is a constant, then –Cov(ax,y) = a Cov(x,y) Var(a x) = a 2 Var(x). –Var(ax) = Cov(ax,ax) = a 2 Cov(x,x) = a 2 Var(x) Cov(x+y,z) = Cov(x,z) + Cov(y,z)

Hence, the variance of a sum equals the sum of the Variances ONLY when the elements are uncorrelated More generally

Regressions Consider the best (linear) predictor of y given we know x The slope of this linear regression is a function of Cov, The fraction of the variation in y accounted for by knowing x, i.e,Var(yhat - y), is r 2

Relationship between the correlation and the regression Slope: If Var(x) = Var(y), then b y|x = b x|y = r(x,y) In the case, the fraction of variation accounted for by the regression is b 2

Properties of Least-squares Regressions The slope and intercept obtained by least-squares: minimize the sum of squared residuals: The regression line passes through the means of both x and y The average value of the residual is zero The LS solution maximizes the amount of variation in y that can be explained by a linear regression on x The residual errors around the least-squares regression are uncorrelated with the predictor variable x Fraction of variance in y accounted by the regression is r 2 Homoscedastic vs. heteroscedastic residual variances

Maximum Likelihood p(x 1,…, x n |  ) = density of the observed data (x 1,…, x n ) given the (unknown) distribution parameter(s)  Fisher (yup, the same one) suggested the method of maximum likelihood --- given the data (x 1,…, x n ) find the value(s) of  that maximize p(x 1,…, x n |  ) We usually express p(x 1,…, x n |  ) as a likelihood function l (  x 1,…, x n  ) to remind us that it is dependent on the observed data The Maximum Likelihood Estimator (MLE) of  are the value(s) that maximize the likelihood function l given the observed data x 1,…, x n 

l (  | x) MLE of  The curvature of the likelihood surface in the neighborhood of the MLE informs us as to the precision of the estimator A narrow peak = high precision. A board peak = lower precision This is formalize by looking at the log-likelihood surface, L = ln [l (  | x) ]. Since ln is a monotonic function, the value of  that maximizes l also maximizes L Var(MLE) = -1 2nd derivative = curvature Negative curvature at a maximum The larger the curvature, the smaller the variance

Likelihood Ratio tests Hypothesis testing in the ML frameworks occurs through likelihood-ratio (LR) tests     Maximum value of the likelihood function under the null hypothesis (typically r parameters assigned fixed values) Ratio of the value of the maximum value of the likelihood function under the null hypothesis vs. the alternative For large sample sizes (generally) LR approaches a Chi-square distribution with r df (r = number of parameters assigned fixed values under null)

Bayesian Statistics An extension of likelihood is Bayesian statistics p(  | x) = C* l(x |  ) p(  ) Instead of simply estimating a point estimate (e.g.., the MLE), the goal is the estimate the entire distribution for the unknown parameter  given the data x posterior distribution of  given x Likelihood function for  prior distribution for  The appropriate constant so that the posterior integrates to one. Why Bayesian? Exact for any sample size Marginal posteriors Efficient use of any prior information MCMC (such as Gibbs sampling) methods