Regression Eric Feigelson. Classical regression model ``The expectation (mean) of the dependent (response) variable Y for a given value of the independent.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Linear regression issues in astronomy Eric Feigelson Summer School in astrostatistics References Isobe, Feigelson, Akritas & Babu, ApJ 364, Feigelson.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Curve-Fitting Regression
The Simple Regression Model
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Chapter 2 Simple Comparative Experiments
Chi Square Distribution (c2) and Least Squares Fitting
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Linear Regression/Correlation
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Correlation and Regression
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis (2)
Simple Linear Regression Models
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Multinomial Distribution
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
INTRODUCTION TO Machine Learning 3rd Edition
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Correlation & Regression Analysis
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Machine Learning 5. Parametric Methods.
Section 6.4 Inferences for Variances. Chi-square probability densities.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapter 9 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Commentary on Chris Genovese’s “Nonparametric inference and the Dark Energy equation of state” Eric Feigelson (Penn State) SCMA IV.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch3: Model Building through Regression
Active Learning Lecture Slides
Chapter 2 Simple Comparative Experiments
Maximum Likelihood Estimation
Linear Regression/Correlation
Modelling data and curve fitting
CHAPTER- 17 CORRELATION AND REGRESSION
Linear regression Fitting a straight line to observations.
Product moment correlation
Parametric Methods Berlin Chen, 2005 References:
Introduction to Sensor Interpretation
Introduction to Sensor Interpretation
Presentation transcript:

Regression Eric Feigelson

Classical regression model ``The expectation (mean) of the dependent (response) variable Y for a given value of the independent variable X (or vector of variables X) is equal to a specified mathematical function f, which depends on both X and a vector of parameters , plus a random error (scatter).” All of the randomness in the model resides in this error term; the functional relationship is deterministic with a known mathematical form. The `error’  is commonly assumed to be a normal (Gaussian) i.i.d. random variable with zero mean,  = N( ,  2 ) = N(0,  2 ).

Parameter estimation & model selection Once a mathematical model is chosen, and a dataset is provided, then the `best fit’ parameters  are estimated by one (or more) of the following estimation techniques: Method of moments Least squares (L 2 ) Least absolute deviation (L 1 ) Maximum likelihood estimation Bayesian inference Choices must be made regarding model complexity and parsimony (Occam’s Razor): Does the  CDM model have a w-dot term? Are three or four planets orbiting the star? Model selection methods are often needed:  , BIC, AIC, … The final model should be validated against the dataset (or other datasets) using goodness-of-fit tests (e.g. Anderson-Darling test) with bootstrap resamples for significance levels.

Robust regression In the s, a major effort was made to design regression methods that are `robust’ against the effects of non-Gaussianity and outliers. Today, a common approach is based on M estimation that reduces the influence of data points with large residuals. Some common methods: 1.Huber’s estimator Here we minimize a function with less dependence on outliers than the sum of squared residuals: 2.Rousseeuw;s least trimmed squares Minimize: 3.MM estimators More complicated versions with scaled residuals

Quantile regression When residuals are non-Gaussian and the sample is large, regressions can be performed for different quantiles of the residuals, not just the mean of the residuals. The methodology is complex but often effective. 10% 50% 90%

Nonlinear astrophysical models Long study of the radial profile of brightness in an elliptical galaxy (or spiral galaxy bulge) as a function of distance r from the center of an elliptical galaxy or spiral galaxy bulge. Models: Hubble’s isothermal sphere (1930), de Vaucouleurs’ r 1/4 law (1948), Sersic’s law (1967):

Nonlinear regression: Count and categorical data Some nonlinear regressions treat problems where the values of the response variable Y are not real numbers. Poisson regression Y is integer valued This occurs often in astronomy (and other fields) when one counts objects, photons, or other events as a function of some other variables. This includes `luminosity functions’ and `mass functions’ of galaxies, stars, planets, etc. However, Poisson regression has never been used by astronomers! The simplest Poisson regression model is loglinear:

`Minimum chi-squared’ regression York’s (1966) analytical solution for bivariate linear regression Modified weighted least squares (Akritas & Bershady 1996) `Errors-in-variables’ or `latent variable’ hierarchical regression models (volumes by Fuller 1987; Carroll et al. 2006) SIMEX: Simulation-Extrapolation Algorithm (Carroll et al.) Normal mixture model: MLE and Bayesian (Kelly 2007) Approaches to regression with measurement error Unfortunately, no consensus has emerged. I think Kelly’s likelihood approach is most promising.

Problems with `minimum chi-square regression a very common astronomical regression procedure Dataset of the form: (bivariate data with heteroscedastic measurement errors with known variances) Linear (or other) regression model: Best fit parameters from minimizing the function: The distributions of the parameters are estimated from tables of the   distribution (e.g.    around best-fit parameter values for 90% confidence interval for 1 degree of freedom; Numerical Recipes, Fig ) This procedure is often called `minimum chi-square regression’ or `chi- square fitting’ because a random variable that is the sum of squared normal random variables follows a   distribution. If all of the variance in Y is attributable to the known measurement errors and these errors are normally distributed, then the model is valid.

However, from a statistical viewpoint …. … this is a non-standard procedure! Pearson’s (1900) chi-square statistic was developed for a very specific problem: hypothesis testing for the multinomial experiment: contingency table of counts in a pre- determined number of categories. where O i are the observed counts, and M i are the model counts dependent on the p model parameters  The weights (denominator) are completely different than in the astronomers’ procedure. R.A. Fisher showed that X 2 P is asymptotically (for large n) distributed as  2 with k-p-1 degrees of freedom. When the astronomers’ measurement errors are estimated from the square-root of counts in the multinomial experiment, then their procedure is equivalent to Neyman’s version of the X 2 statistic with the observed counts in the denominator.

Why the astronomers procedure can be very uncertain! When the experiment does not clearly specify the number of `bins’, the degrees of freedom in the resulting   distribution is ill-determined (Chernoff & Lehmann 1954). Astronomers often take real valued (X,Y) data and `bin’ them in arbitrary ordered classes in X (e.g. a histogram), average the Y values in the bin, and estimate the measurement error as sqrt(n_bin), and apply   fitting. This violates the Chernoff- Lehmann condition. The astronomer often derives the denominator of the   -like statistic in complicated and arbitrary fashions. There is no guarantee that the resulting `measurement errors’ account for the scatter in Y about the fitted function. The values may be estimated from the noise in the astronomical image or spectrum after complicated data reduction. Sometimes the astronomer adds an estimated systematic error to the noise error term, or adjusts the errors in some other fashion. The result is arbitrary changes in the minimization procedure with no guarantee that the statistic is   distributed.