CORRELATION-REGULATION ANALYSIS Томский политехнический университет.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Managerial Economics in a Global Economy
Lesson 10: Linear Regression and Correlation
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 10 Curve Fitting and Regression Analysis
Correlation and Regression
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression and Correlation
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Simple Linear Regression
SIMPLE LINEAR REGRESSION
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Business Statistics - QBM117 Statistical inference for regression.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Linear Regression/Correlation
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Relationships Among Variables
Correlation and Linear Regression
Chapter 8: Bivariate Regression and Correlation
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
Linear Regression and Correlation
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Chapter 6 & 7 Linear Regression & Correlation
Regression and correlation analysis (RaKA) 1. Investigating the relationships between the statistical characteristics: 2 Investigating the relationship.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation & Regression
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Fundamentals of Data Analysis Lecture 10 Correlation and regression.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Linear Regression.
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Linear Regression/Correlation
Product moment correlation
Lecture # 2 MATHEMATICAL STATISTICS
Multiple Regression Berlin Chen
Presentation transcript:

CORRELATION-REGULATION ANALYSIS Томский политехнический университет

2 In simulation of certain components of complex systems there is often a challenge to establish qualitative and quantitative relationship between input and output of some functional units. Certain components of complex system can be represented as a box that connects via the sum of its internal parameters input stimulus with output signals. Functional unit of complex system x2 y 1 y m.. x1 …. s 1 s k.. Input Output Unit parameters

3 Томский политехнический университет If mathematical expressions describing behavior of the box are known, it is easy to define its output signals for a given input stimulus by solving direct problem. This situation is the most easy- to-system modeling. It occurs when object's behavior is uniquely described by the known laws of physics (dependence of current on voltage in a circuit), or equations relating inputs and outputs of functional units are obtained from previous studies of similar systems.

4 Томский политехнический университет The simplest way to visually identify the relationship between quantitative variables is to design a scatterplot, which is a graph in which along the horizontal axis (x) one variable and along the vertical (y) another variable are marked. Each object in the diagram corresponds to a point whose coordinates are equal to the values ​​of the pair of variables selected for analysis (fig. 4.2). Totally, there are n experimental points in the graph which correspond to n observations. The scatterplot is a "cloud" of points in a coordinate plane. If the cloud of points resembles a line shape, it can be assumed that we see in scatterplot the form of dependence, which is distorted by the influence of some factors causing points deviation from theoretical form.

5 Томский политехнический университет Graphical view of observation results Actual dependence of energy consumption on the number of residents Ths. kWh

6 Томский политехнический университет The scatterplot is a "cloud" of points in a coordinate plane. If the cloud of points resembles a line shape, it can be assumed that we see in scatterplot the form of dependence, which is distorted by the influence of some factors causing points deviation from theoretical form. In this example, it can be assumed a linear relationship between population and amount of electricity consumed - one- dimensional linear regression model. However, through the cloud of points you can cross a lot of lines and the eye cannot determine which one suits better to describe the desired function.

7 Томский политехнический университет In general, equation of a straight line is described by the expression: Y=A 0 +A 1 ∙X(4.1) Hence to obtain regression equation it is necessary to determine the values ​​of coefficients А 0 and А 1. One of the most popular method, which allows to calculate the values ​​of coefficients, i.e. to determine position of the line that best passes through a cloud of given points, is the method of least squares. The main idea of ​​the method of least squares is to minimize the squared errors (lengths) from experimental points to points on the theoretical straight line.

8 Томский политехнический университет To obtain regression equation by least squares it is necessary to perform consistently the following calculations: 1.For each n of experimental points it is necessary to calculate the error (E i ) between experimental (Y i exp ) and theoretical value (Y i theor ), lying on a straight line, given by equation (4.1): E i = (Y i exp. – Y i theor. ), i = 1, …, n or E i = Y i – A 0 – A 1 · X i, i = 1, …, n (4.2) 2. Errors E i for all n points need to add up. To make sure that positive errors do not compensate in sum negative ones, each of the errors is squared and added their value to the total error S of the same sign: S=E i 2 = (Y i – A 0 – A 1 · X i ) 2, i = 1, …, n. (4.3)

9 Томский политехнический университет Total error S is a function of two variables A 0 and A 1, changing them we can influence the magnitude of total error. The principle of least-squares method is selection of the coefficients A 0, A 1 of linear function Y = A 1 X + A 0, so that its graph is held as close as possible simultaneously to all experimental points: (4.4)

10 Томский политехнический университет 3. Necessary condition for the minimum function of several variables is equality of all its partial derivatives to zero. We find the partial derivatives of S with respect to each of the variables, and equate them to zero: (4.5) After the transformations equation system (4.5) can be represented as follows: : (4.6)

11 Томский политехнический университет From the system of linear equations (4.6) it can be expressed formulas for the direct determination of variables A 0, A 1 of the desired linear function: (4.7)

12 Томский политехнический университет To quantify closeness of relationship between variables, determine its direction it is necessary to conduct correlation analysis of the available experimental data. Thus, solution to the problem of designing qualitative mathematical model of the object by available statistics (experimental) data is possible only on the basis of correlation- regression analysis. Correlation and regression analysis is a branch of statistics - science which studies general problems of measuring and analyzing of mass quantitative relationships and interactions.

13 Томский политехнический университет In terminology of statistics input variables are named factor characteristics, i.e. characteristics that cause an immediate change of other characteristics, or create opportunities for its change. Output variables are called resultant characteristics, i.e. characteristics whose magnitude depends on the factor characteristics. For example, electricity consumption is now resultant characteristic, whose value depends on the factor characteristics - amount and range of products.

14 Томский политехнический университет Correlation and regression analysis allows to quantify closeness, direction of statistical relationship and to establish analytical expression depending on the result of specific factors remaining constant when the rest factor characteristics affect resultant characteristic. To perform correlation and regression analysis the following conditions are necessary: sufficiently large volume of sample population: number of observations should exceed more than 10 times the number of factors influencing result; qualitatively homogeneous sample population; obedience of population distribution by resultant and factor characteristics to the normal distribution law or close to it.

15 Томский политехнический университет When carrying out correlation and regression analysis, the following problems are solved: Identity of relationship between resultant and factor characteristics; Identity of relationship forms; Identity of strength (closeness) and direction of relationship;  Prediction of possible values ​​of resultant characteristics ​​based on specified values of factor characteristics.

16 Томский политехнический университет Regression in statistics is dependence of mean value of any quantity y on another quantity x or number of quantities х i. Pair regression is model that expresses dependence of mean value ​​of dependent variable y on single independent variable x: 4.10 where y - dependent variable (resultant characteristic), x - independent variable (factor characteristic). Pair regression is used when there is a dominant factor that may influence a large proportion of change in dependent variable.

17 Томский политехнический университет Multiple regression is called a model, expressing dependence of the mean value of dependent variable y on a number of independent variables х 1, х 2, …, х n : 4.11 Multiple regression is used in cases when out of many factors influencing resultant characteristic, cannot be identified a dominant factor and it is necessary to take into account simultaneous influence of several factors.

18 Томский политехнический университет Using pair regression equation (4.10), model of the relationship between variables y and x can be represented as follows: 4.12 where the first term f(x) can be interpreted as that part of the value y, which is explained by regression equation (4.10), while the second term ε as unexplained part of the value y. Relationship between these parts characterizes quality of regression equation, its ability to represent actual relationship between variables x and y. The presence of component ε is due to such factors as availability of additional factors that influence variable y, wrong view of functional dependence f (x), measurement error, selective nature of input data. When designing regression equation, ε is regarded as model error, which is a random variable that satisfies certain conditions.

19 Томский политехнический университет The main types of pair regression equations Regression typeRegression equation Linear Hyperbolic Polynomial Power

20 Томский политехнический университет To estimate equation parameters of pair regression method of least squares is used. The method of least squares is to identify such coefficients asа 0, a 1, a 2, for which the sum of squared deviations of actual values y i ​​from theoretical result will be minimal. Equation of pair linear regression is often shown as follows: 4.13 To determine parameters a, b by the least squares method it is necessary to solve the following system of standard equations:

21 Томский политехнический университет It is obtained as a result of system solution (4.13): 4.14 where – mean factor value х; – mean resultant variable y; – mean square of variables х; – mean product of variables х and y;

22 Томский политехнический университет Closeness and direction of pair linear correlation is measured by means of linear correlation coefficient r ху : 4.15  mean-square deviation of variable х; where n – number of observations; x i, y i – observation data; – mean values of variables x and y;  mean-square deviation of variable у ;

23 Томский политехнический университет Positive values ​​of correlation coefficient show positive relationship between characteristics, negative – negative correlation. Correlation relationships between variables a) – positive; b) – negative

24 Томский политехнический университет Correlation coefficients for various relationships

25 Томский политехнический университет Having obtained regression equation, it is necessary to assess its significance. Checking the significance of regression equation involves answering two important questions: whether a mathematical model that expresses relationship between variables corresponds to experimental data?; whether there are enough included in the equation explanatory variables for the description of dependent variable?.

26 Томский политехнический университет Accuracy of the model can be estimated by regression mean square error: 4.21 To assess quality of the model average error of approximation is used, which is mean relative deviation of calculated values from observables:

27 Томский политехнический университет Checking the significance of regression equation is based on analysis of dispersion. The central place in this case is the analysis of three sums: - total sum of squared deviations of the studied parameter y from its average value; total sum of sguares - sum of squared deviations y is explained by regression; regression sum of sguares - residual sum of squared deviations y is due to the influence of factors unaccounted in simulation; error sum of sguares

28 Томский политехнический университет Quality of regression model design is estimated using coefficient of determination: By the definition The closer the value of R 2 to unity, the better the regression equation fits observation data. When R 2 = 1 the relation holds for all observations, i.e. dependence is functional.

29 Томский политехнический университет The value of R 2 shows what percentage of total dispersion (variance) in resultant characteristic y is explained by regression equation. For example, the value of R 2 = 0,8 means that regression equation explains 80% of total dispersion (variance) of resultant y. Thus, by the value of R 2 it can be judged how well the model fits original data. Since the value R 2 is defined by the sum of squared deviations, it is necessary to know the number of degrees of freedom k, which is associated with the number of indicator observations and defined constants for them.

30 Томский политехнический университет Dispersion per degree of freedom Variance sources (dispersion) Sums of squared deviations Number of degrees of freedom Dispersion per degree of freedom Total n - 1 Explanatory 1 Residual n - 2