Correlation Forensic Statistics CIS205. Introduction Chi-squared shows the strength of relationship between variables when the data is of count form However,

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Correlation and Regression
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter18 Determining and Interpreting Associations Among Variables.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
Inferences About Process Quality
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
Descriptive Methods in Regression and Correlation
Linear Regression.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Inference for regression - Simple linear regression
Linear Regression and Correlation
CORRELATION & REGRESSION
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
The Scientific Method Interpreting Data — Correlation and Regression Analysis.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Elementary Statistics Correlation and Regression.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Correlation & Regression Analysis
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Go to Table of Content Correlation Go to Table of Content Mr.V.K Malhotra, the marketing manager of SP pickles pvt ltd was wondering about the reasons.
Determining and Interpreting Associations between Variables Cross-Tabs Chi-Square Correlation.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Statistical Inferences for Population Variances
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Inference about the slope parameter and correlation
Correlation and Simple Linear Regression
Correlation – Regression
Elementary Statistics
CHAPTER fourteen Correlation and Regression Analysis
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
Simple Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression and Correlation
Making Use of Associations Tests
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Correlation Forensic Statistics CIS205

Introduction Chi-squared shows the strength of relationship between variables when the data is of count form However, many variables measured in a lab are on a continuous scale, such as concentrations of chemicals, time, and most machine responses The term for the strength of the relation between continuous variables is correlation Any continuous variables which have some sort of systematic relationship are said to covary, and any variable which covaries with another is said to be a covariate. A basic tool for the investigation of correlation is the scatterplot. Usually only two variables are plotted, but three can be accommodated.

Correlation Coefficient A statistical measure of correlation is called the correlation coefficient, which can only take on values between -1 and 1. Both 1 and -1 mean that the variables are absolutely related 1 means that as one variable increases, so does the other -1 means that as one variable increases, the other decreases. 0 means that the variables are unrelated. The strength of relationship is independent of the form of relationship. Most commonly relationships are linear (plotting one variable against another yields a straight line), next most commonly loglinear (a graph of one variable against the logarithm of the other is linear).

Ageing properties of the dye methyl violet (Grim et al., 2002) This example will be used to demonstrate the process involved in the calculation of a linear correlation coefficient Laser desorption mass spectrometry was used to examine the ageing properties of the dye methyl violet, a dye used in inks from the 1950s. Documents written in methyl violet ink were artificially aged with ultra violet radiation. After various times the average molecular weight for the methyl violet compound was measured. The raw data is shown in table 6.1, and plotted in figure 6.2

Table 6.1. Average molecular weight of the dye methyl violet and UV irradiation time from an accelerated ageing experiment. Time (min)Weight (Da)

Correlation coefficient r Visual inspection of Fig. 6.2 suggests that there is a negative linear correlation between time and mean molecular weight. A suitable measure of this linear correlation r is:

. Time (min) x – mean x (x – mean x)² Weight (Da) y – mean y (y – mean y)² (x – mean x)(y – mean y) mean x = Σ = mean y = Σ = Σ =

Substituting these values into the equation for r we have: This means that as the irradiation time increases the average molecular weight of methyl violet ions decreases, and as is close to -1, the negative linear relationship is quite strong

Significance tests for correlation coefficients A linear correlation coefficient of sounds quite high, but is it significantly high? Is it possible that such a coefficient would occur in data drawn randomly from a bivariate normal distribution? Also, what about the effect of sample size? It makes sense that a high coefficient based on lots of x,y pairs is somehow more significant than an equal correlation based on only a few observations. For the null hypothesis that the correlation coefficient is 0, a suitable test statistic is: t = r * √df / √ (1 - r²).

Substituting for the methyl violet example t = r * √df / √ (1 - r²). t is the ordinate (horizontal axis) on the t-distribution df is degrees of freedom equal to n – 2 (here = 6 because we have 8 x,y pairs) The linear correlation coefficient was -0.89, so: t = * √6 / √ ( ²) = If we look at the values of the t-distribution table for df = 6 we see that 95% of the area is within ± Our value of is beyond , so we can say that the correlation coefficient is significant at 95% confidence.

Correlation coefficients for non- linear data Andrasko and Ståhling measured three compounds associated with the discharge of firearms, napthalene, TEAC-2 and nitroglycerin over a period of time by solid phase microextraction (SPME) of the gaseous residue from the expended cartridge. They found that the concentrations of these compounds would decrease with time, and that this property would be of use in estimating the time since discharge for this type of cartridges. Table 6.3 is a table of the peak area for nitroglycerine and time elapsed since discharge for a Winchester SKEET 100 cartridge stored at 7°C, shown as scatterplots in Figure 6.3

Time since discharge (days) Nitroglycerin (peak height)

Log-linear relationships A common model for loss in chemistry (e.g. radioactive decay) is called inverse exponential decay, which entails a log-linear relationship between the two variables The right hand scatterplot of Figure 6.3 shows the log to the base e (or natural logarithm) of the nitroglycerine peak height against time. Here we can see that the data looks much more linear. The linear correlation coefficient is -0.95, which is quite high, and suggests that this may be a reasonable transformation of the variables The calculations for the log-linear correlation coefficient are exactly the same kind as in table 6.2, only the log to the base e of the y variable has been used, rather than the untransformed y.

The coefficient of determination The coefficient of determination is a direct measure of how much the variance in one of the covariates is attributed to the other. We can imagine that the total variance in the nitroglycerin peak is made up of two parts, that which is attributable to the relationship with x (time), and that which can be seen as random noise. The coefficient of determination describes what proportion of the variance is attributable to relationship with time. The coefficient of determination is simply the square of the correlation coefficient. If r = , r² = Often the coefficient of determination is described as a percentage, which in the example above would mean that 90% of the variance in nitroglycerin peak area is attributable to time.