Download presentation
Presentation is loading. Please wait.
1
EE, NCKU Tien-Hao Chang (Darby Chang)
Numerical Analysis EE, NCKU Tien-Hao Chang (Darby Chang)
2
Correlation coefficient
What we need is a single summary number that answers the following questions: does a relationship exist? if so, is it a positive or a negative relationship? and is it a strong or a weak relationship? Correlation coefficient: A single summary number that gives you a good idea about how closely one variable is related to another variable
3
Correlation coefficient Two-way scatter plot
Suppose that we are interested in a pair of continuous random variables For example, relationship between the percentage of children who have been immunized against the infectious DPT and mortality rate Data for a random sample of 20 countries are shown in the next slide X: the percentage of children immunized by age on year Y: the under-five mortality rate Before we do any analysis, we should create a two-way scatter plot of the data relationship exists between x and y? The mortality rate tends to decrease as the percentage of children immunized increase
5
Pearson’s CC In the underlying population form which the sample of points (xi,yi) is selected, the population correlation between the variables X and Y The quantifies the strength of the linear relationship between the outcomes x and y The estimator of ρ or r is known as Pearson’s coefficient of correlation or correlation coefficient
6
The correlation coefficient is dimensionless number; it has no units of measurement.
the value r=1 and r=-1 occur when there is an exact linear relationship between x and y if y tends to increase in magnitude as x increases, r is greater than 0; x any y are said to be positively correlated if y decreases as x increases, r is less than 0 and the two variables are negatively correlated if r=0, there is no linear relationship between x and y and the variables are uncorrelated
7
http://upload. wikimedia
8
CC is not a percent In addition to telling you
whether two variables are related to one another, whether the relationship is positive or negative and how large the relationship is, The correlation coefficient tells you one more important bit of information—it tells you exactly how much variation in one variable is related to changes in the other variable A correlation coefficient is a “ratio” not a percent many students tend to think when r = .90 it means that 90% of the changes in one variable are accounted for or related to the other variable even worse, some think that this means that any predictions you make will be 90% accurate both are not correct!
9
Correlation Coefficient Coefficient of determination
However it is very easy to translate the correlation coefficient into a percentage All you have to do is “square the correlation coefficient” which means that you multiply it by itself So, if the symbol for a correlation coefficient is “r”, then the symbol for this new statistic is simply “r2” which can be called “r squared” r2, also called the “Coefficient of Determination”, tells you how much variation in one variable is directly related to (or accounted for) by the variation in the other variable
10
The correlation coefficient is r = 0. 80
The correlation coefficient is r = By squaring r to get r2, you fully 64% of the variation in scores on Variable B is directly related to how they scored on Variable A.
11
Statistical test
12
Correlation coefficient Statistical inference
To test a significant correlation between two variables H0:r = 0 H1:r ≠ 0 The statistic (under H0): with n-2 degrees of freedom (pp. 9-14)
13
Step 1: State the hypotheses Step 2: Find the critical values
Test the significance of the correlation coefficient for the age and blood pressure data suppose that n=6, r=0.897 and α=0.05 Step 1: State the hypotheses H0:r = 0 H1:r ≠ 0 Step 2: Find the critical values since α=0.05 and there are 6–2=4 degrees of freedom, the critical values are t = and t = –2.776. Step 3: Compute the test value t = 4.059 Step 4: Make the decision reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776) Step 5: Summarize the results there is a significant relationship between the variables of age and blood pressure
14
Correlation coefficient Limitations
It quantifies only the strength of the linear relationship between two variables Care must be taken when the data contain any outliers, or pairs of observations that lie considerably outside the range of the other data points A high correlation between two variables does not imply a cause-and-effect relationship
15
Four sets of data with the same correlation of 0.816
Four sets of data with the same correlation of 0.816
16
Spearman’s Rank CC Pearson’s correlation coefficient is very sensitive to outlying values We may be interested in calculating a measure of association that is more robust One approach is to rank the two sets of outcomes x and y separately and known as Spearman’s rank correlation coefficient where xri and yri are the rank associated the ith subject rather than the actual observations
17
About Correlation Coefficient
18
Statistical inference
Basic tests tests about proportions tests about one mean tests of the equality of two means tests for variances references (pp ) More advanced tests ANOVA (analysis of variance) goodness of fit (Wilcoxon test, Kolmogorov-Smirnov test, …)
19
Multivariate analysis
Statistics ANOVA Multiple linear regression PCA (principle component analysis) ICA (independent component analysis) LDA (linear discriminant analysis) So far, all techniques belong to statistics. You could find them in most statistical software, such as MATLAB, R ( SPSS… Machine learning Naïve Bayes ( pp ) LIBSVM ( RVKDE (
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.