Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and regression
Objectives 10.1 Simple linear regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
10-3 Inferences.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Accuracy and Precision of Fitting Methods
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Multiple Regression
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Multiple Regression and Correlation Analysis
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
SIMPLE LINEAR REGRESSION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Spreadsheet Problem Solving
Simple Linear Regression Analysis
Generalized Linear Models
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Correlation & Regression
Objectives of Multiple Regression
Introduction to Multilevel Modeling Using SPSS
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Hypothesis Testing in Linear Regression Analysis
GEO7600 Inverse Theory 09 Sep 2008 Inverse Theory: Goals are to (1) Solve for parameters from observational data; (2) Know something about the range of.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Why Model? Make predictions or forecasts where we don’t have data.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Section 12.3 Regression Analysis HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc. All.
Analysis and Interpretation: Analysis of Variance (ANOVA)
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Simple Linear Regression Model: Specification and Estimation  Theory suggests many relationships between variables  These relationships suggest that.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Stats Methods at IC Lecture 3: Regression.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
Statistical Data Analysis - Lecture /04/03
Statistical Methods For Engineers
6.7 Practical Problems with Curve Fitting simple conceptual problems
Regression Statistics
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental error and its estimate 3.Prediction confidence intervals 4.Single-point confidence intervals 5.Parameter confidence intervals 6.Experimental design Larry Baxter ChEn 477

Linear vs. Nonlinear Models Linear and nonlinear refer to the coefficients, not the forms of the independent variable. The derivative of a linear model with respect to a parameter does not depend on any parameters. The derivative of a nonlinear model with respect to a parameter depends on one or more of the parameters.

Linear vs. Nonlinear Models

Typical Data

Kinetic Data Analysis

Graphical Summary The linear and non-linear analyses are compared to the original data both as k vs. T and as ln(k) vs. 1/T. As seen in the upper graph, the linearized analysis fits the low-temperature data well, at the cost of poorer fits of high temperature results. The non- linear analysis does a more uniform job of distributing the lack of fit. As seen in the lower graph, the linearized analysis evenly distributes errors in log space

A practical Illustration

Extension

For Parameter Estimates In all cases, fit what you measure, or more specifically the data that have normally distributed errors, rather than some transformation of this. Any nonlinear transformation (something other than adding or multiplying by a constant) changes the error distribution and invalidates much of the statistical theory behind the analysis. Standard packages are widely available for linear equations. Nonlinear analyses should be done on raw data (or data with normally distributed errors) and will require iteration, which Excel and other programs can handle.

Two typical datasets

Straight-line Regression EstimateStd Errort-StatisticP-Value intercept e e-14 slope-3.264e e e-10 EstimateStd Errort-StatisticP-Value intercept e e-10 slope-3.214e e e-8

Mean Prediction Bands

Single-Point Prediction Bands

Point and Mean Prediction Bands

Rigorous vs. Propagation of Error

Propagation of Error

Single-Point Prediction Band

Residual SP Prediction Band (95%)

Prediction Band Characteristics

Nonlinear Equations

Parameter Characteristics Linear models Parameters are explicit functions of data – do not depend on themselves. Parameters require no iteration to compute. Normal equations are independent of parameters. Nonlinear models Parameters depend on themselves – need an estimate to begin iterative computation Parameters generally determined by converging and optimization problem, not by explicit computation. Optimization problem commonly quite difficult to converge.

ANOVA and Conf. Int. Tables DFSSMSF-StatisticP-Value x e e-10 Error e e-7 Total e-4 DFSSMSF-Statistic P-Value x e e-8 Error e e-8 Total e-4 EstimateStandard ErrorConfidence Interval intercept { , } slope e e-5{ e-4, e-4} EstimateStandard ErrorConfidence Interval intercept { , } slope e e-6{ e-4, e-4}

Confidence Interval Ranges

Some Interpretation Traps It would be easy, but incorrect, to conclude That reasonable estimates of the line can, within 95% probability, be computed by any combination of parameters within the 95% confidence interval for each parameter That experiments that overlap represent the same experimental results, within 95% confidence That parameters with completely overlapped confidence intervals represent essentially indistinguishable results If you consider the first graph in this case, all of these conclusions seem intuitively incorrect, but experimenters commonly draw these types of conclusions. Joint or simultaneous confidence regions (SCR) address these problems

Simultaneous Confidence Region

Regions and Intervals Compared

Nonlinear SCR More Complex The Cauchy or Lorentz equation (is a probability density function and describes some laser line widths). In this case, there are 4 noncontiguous regions.

Sim. or Joint Conf. Regions Region defined by

Simultaneous Confidence Regions Overlapping confidence intervals is a poor test for difference in data sets. Data that may appear to be similar based on confidence intervals may in fact be quite different and certainly distinct from one another. Parameters in the lonely corners of interval unions are exceptionally poor estimates.

Confidence Region Characteristics

Nonlinear Equations Regions assume many shapes and may not be contiguous. Parameters generally correlated, but not linearly correlated. Parameter uncertainty usually much smaller than confidence interval at given values of other parameters. Parameter uncertainty range exceeds conf. interval range and may not be bounded.

Experimental Design In this context, experimental design means selecting the conditions that will maximize the accuracy of your model for a fixed number of experiments. There are many experimental designs, depending on whether you want to maximize accuracy of prediction band, all parameters simultaneously, a subset of the parameters, etc. The d-optimal design maximizes the accuracy of all parameters and is quite close to best designs for other criteria. It is, therefore, by far the most widely used.

Fit vs. Parameter Precision y x Generally there is a compromise between minimizing parameter variance and validating the model. Four ways of using 16 experiments Design 1234 Lack of fit df Pure error df P sites

Typical (nonlinear) Application

Example Point No Compare 3 different design, each with the same number of data points and the same errors

SCR Results for 3 Cases 10 points equally spaced where Y changes fastest (0-1) 10 equally spaced points between 0 and 4.6 Optimal design (5 pts at 0.95 and 5 pts at 4.6)

Prediction and SP Conf. Intervals

Improved Prediction & SP Intervals Optimal design (blue) improves predictions everywhere, including extrapolated regions. Recall data for optimal design regression were all at two points, 0.95 and 4.6.

Linear Design Summary For linear systems with a possible experimental range from x 1 to x 2 Straight line – equal number of points at each of two extreme points Quadratic – extreme points plus middle Cubic – extreme points plus points that are located at In general, the optimal points are at the maxima of Generally, a few points should be added between two of these points to assure goodness of fit. (equally spaced would be at)

Nonlinear Experimental Design

Nonlinear Design Summary

Analytical and Graphical Solutions

Conclusions Statistics is the primary means of inductive logic in the technical world. With proper statistics, we can move from specific results to general statements with known accuracy ranges in the general statements. Many aspects of linear statistics are commonly misunderstood or misinterpreted. Nonlinear statistics is a generalization of linear statistics (becomes identical as the model becomes more linear) but most of the results and the math are more complex. Statistics is highly useful in both designing and analyzing experiments.