Fundamentals of Data Analysis Lecture 10 Correlation and regression.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 10 Curve Fitting and Regression Analysis
Describing Relationships Using Correlation and Regression
Introduction to Regression Analysis
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
REGRESSION AND CORRELATION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
SIMPLE LINEAR REGRESSION
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Lecture 16 Correlation and Coefficient of Correlation
Objectives of Multiple Regression
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Linear Regression and Correlation
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
WELCOME TO THETOPPERSWAY.COM.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Stats Methods at IC Lecture 3: Regression.
Summarizing Descriptive Relationships
Simple Linear Correlation
Statistical analysis.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Regression Analysis AGEC 784.
CORRELATION.
ESTIMATION.
Correlation & Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistical analysis.
Chapter 5 STATISTICS (PART 4).
Correlation and Regression
CONCEPTS OF ESTIMATION
REGRESSION.
6-1 Introduction To Empirical Models
Chapter 14 – Correlation and Simple Regression
CORRELATION ANALYSIS.
SIMPLE LINEAR REGRESSION
Product moment correlation
An Introduction to Correlational Research
SIMPLE LINEAR REGRESSION
Topic 8 Correlation and Regression Analysis
Lecture # 2 MATHEMATICAL STATISTICS
ESTIMATION.
Warsaw Summer School 2017, OSU Study Abroad Program
Summarizing Descriptive Relationships
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Fundamentals of Data Analysis Lecture 10 Correlation and regression

Program for today F Basic concepts F Correlation diagram and correlation table F Linear correlation F Linear regression F The correlation of the multiple variables F Regression curves

Basic concepts Correlation is defined as the statistical interdependence of measurements of different phenomena, depending on the common reason or are to each other in a direct causal relationship. Note, however, that the concept of correlation is different from both the causal relationship and the notion of stochastic dependence between random variables. An extreme case is the correlation of co-linear random variables. The correlation is said to be simple or positive when an increase in one variable increases the other. However, when the increase in one variable is accompanied by degrease of second we are dealing with an inverse or negative correlation.

Basic concepts Regression in mathematical statistics is empirically determined the functional relationship between the correlated random variables. Having established that between the studied traits are very weak correlation, proceed to find a regression function that allows you to predict the value of one feature with the assumption that the second characteristic of a defined value. In practice, the most important is the linear regression, corresponding to a linear relationship between the random variables under consideration. Although linear regression is rare in practice, in the form of "pure", but is a convenient tool for obtaining approximate relationships.

Basic concepts For more complex interdependencies non-linear regression is used, for example a square regression. Two models of the data are distinguished: I-st model, in which the values ​​ of the random variable is known (well defined) II-nd model, in which the random variable is random or vitiated by an error.

Correlation table and correlation diagram If we have the general population, in which there are two measurable characteristics of X and Y, and they are random variables, and if certain parameters for two-dimensional variable (X, Y) distribution are unknown, this raises the problem of determination of their estimates based on the random sample n pairs of numbers (x i, y i ). Treating x i and y i as the coordinates of the point on the plane, a sample can be represented graphically in a correlation diagram.

Correlation table and correlation diagram To make the table should be for each of the features to build series of distribution, calculating the interval: R x = x max - x min R y = y max - y min then on the basis of the sample size n we take the appropriate number of classes k and calculate the length of the class : d x = R x / k d y = R y / k As the lower limit of the first class of variable we accept value slightly lower than the minimum value, and as the upper limit of the last class the value of a little more than the maximum value.

Correlation table and correlation diagram

Linear correlation The strength of the interdependence of two variables can be expressed numerically by many measures, but the most popular of these is the Pearson correlation coefficient: where the covariance is described in relationship: Estimator of the correlation coefficient  between the two test features X i Y in the population is the correlation coefficient of the sample, calculated on the basis of n pairs (x i, y i ) of results with the aid of equation:

Linear correlation Factor called the coefficient of determination r, with (n- 1) degrees of freedom, can be the estimator of correlation.

Linear correlation The correlation coefficient takes values ​​ between [-1;1]. Coefficient refers to the strength of the relationship. The closer to zero is the weaker relationship them closer to 1 or -1, the stronger. The value of 1 indicates a perfect linear relationship. Sign of the correlation coefficient refers to the direction of union "+" indicates a positive relationship, ie an increase (decrease) in value of one trait will increase (decrease) in the other. "-" Negative direction, ie an increase (decrease) in the value of features results in a decrease (increase) on the other.

Linear correlation Assume the following assessment of the strength of correlation (keeping in mind the appropriate sample size): below negligible from 0.1 to 0,3 - weak from 0.3 to mean from 0.5 to high from 0.7 to 0.9 – very high above almost full. This scale is arbitrary.

Correlation table and correlation diagram Example N = 50 measurements of cast dimensions was made, results are shown in Table. At the 95% confidence level to verify the hypothesis that there is a correlation between the dimensions of the castings.

Correlation table and correlation diagram Example ixiyiixiyi

Correlation table and correlation diagram Example We calculate the gaps : R x = = 13.4 and R y = = 3.2 As the number of measurements n = 50 we take the number of classes k equal to 7. Thus, the length of the classes are equal: for characteristics of X (dimension): dx = R x / k = 13.4 / 7  2 and for characteristics of Y : dy = 3.2 / 7  0.5. As the lower limit for characteristics of X we assume x = 31.0 and for characteristics of Y value y = 3.25.

Correlation table and correlation diagram Example

Correlation table and correlation diagram Example Mean values for x = and for y = 5.19 and the standard deviations are respectively and , thus

Correlation table and correlation diagram Example

Correlation table and correlation diagram Exercise Prepare correlation table and correlation diagram for the data presented below: the number of podsmean number of seedsmean weight of seeds

Linear regression The general population is given, in which the characteristics (X, Y) have a two-dimensional distribution. Regression straight line of second type for characteristics of Y versus the characteristics of X are given by the equation : where: is called the coefficient of a linear regression of characteristics of Y on X, and is the coefficient of the offset.

Linear regression

The correlation of the multiple variables In the case of correlation of more than two variables the following additional terms should be defined: Simple correlation (total) is the correlation between the two variables (without taking into account other variables). Partial correlation is correlation between the two variables when other variables are held constant. Multiple correlation is a correlation between the number of connected variables, which change simultaneously.

Regression curves Regression curves have the general form of the equation : y = a + b 1 x 1 + b 2 x where b i is the partial regression coefficient of the i-th order.

Regression curves Surface chart

Thank you for attention !