Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slides:



Advertisements
Similar presentations
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Advertisements

4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
Correlation and Linear Regression.
One-sample T-Test of a Population Mean
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Assumption of normality
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Outliers Split-sample Validation
CJ 526 Statistical Analysis in Criminal Justice
Detecting univariate outliers Detecting multivariate outliers
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Multiple Regression – Assumptions and Outliers
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity.
8/2/2015Slide 1 SPSS does not calculate confidence intervals for proportions. The Excel spreadsheet that I used to calculate the proportions can be downloaded.
Correlation Analysis 5th - 9th December 2011, Rome.
Assumption of Homoscedasticity
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Aim: How do we calculate and interpret correlation coefficients with SPSS? SPSS Assignment Due Friday 2/12/10.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Sampling Distribution of the Mean Problem - 1
SW318 Social Work Statistics Slide 1 Estimation Practice Problem – 1 This question asks about the best estimate of the mean for the population. Recall.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
9/18/2015Slide 1 The homework problems on comparing central tendency and variability extend the focus central tendency and variability to a comparison.
Hierarchical Binary Logistic Regression
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Stepwise Multiple Regression
110/10/2015Slide 1 The homework problems on comparing central tendency and variability extend our focus on central tendency and variability to a comparison.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
12/14/2015Slide 1 The dependent variable, poverty, is plotted on the vertical axis. The independent variable, enrolPop, is plotted on the horizontal axis.
Practice Problem: Lambda (1)
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square.
SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample.
1/23/2016Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Assumption of normality
Computing Transformations
Theme 7 Correlation.
Multiple Regression – Split Sample Validation
Performing the Spearman Rank-Order Correlation Using SPSS
Presentation transcript:

Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of two quantitative variables. If the relationship is not linear, the application of statistics that assume linearity may give questionable results. Determining whether a relationship should be characterized as linear or non-linear is challenging. One indicator of non-linearity is the difference between the rank-order correlation correlation coefficient (Spearman's rho) and Pearson's r. When Spearman's rho is larger than Pearson's r, the relationship is likely to be non-linear, and Pearson's r may understate the strength of the relationship. However, we can improve the linearity of the relationship and justify the use of statistics that assume linearity if one or both variables are badly skewed due to outliers, but can be corrected by re-expressing the data.

Slide 2 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables) The variables to use in the analysis: percent of the total population who was literate [literacy] HIV-AIDS adult prevalence rate [hivaids]

Slide 3 These problem also contains a second paragraph of instructions that provide the formulas to use if our examination of the association between quantitative variables requires us to re-express or transform one or both of the variables.

Slide 4 The first statement concerns the number of valid cases. To answer this question, we produce the statistics using the SPSS Correlate procedure.

Slide 5 To compute correlations, select Correlate > Bivariate from the Correlate menu.

Slide 6 First, move the variables hivLive and literacy to the Variables list box. Second, mark the check box for Spearman and leave the check box for Pearson marked. Third, click on the OK button to produce the output.

Slide 7 The Correlations table shows us the number of cases available for computing the correlation – 131.

Slide 8 The number of cases with valid data to analyze the relationship between "percent of the total population who was literate" and "number of people living with HIV- AIDS" was 131, out of the total of 218 cases in the data set. Click on the check box to mark the statement as correct.

Slide 9 The next statement asks us to extract the correlation coefficients from the SPSS output, and compare the two.

Slide 10 The Correlations table shows us that the Pearson r correlation is

Slide 11 The Nonparametric Correlations table shows us that the Spearman rho correlation is The comparison of the strength of the relationship indicated by each measure is based on the relative size of the absolute values of the coefficients. Since the absolute value of Spearman's rho (0.548) is larger than the absolute value of Pearson's r (0.203), Spearman's rho indicates a stronger relationship.

Slide 12 Pearson's r was correctly identified as Spearman's rho was correctly identified as The comparison of the strength of the relationship indicated by each measure is based on the relative size of the absolute values of the coefficients. Since the absolute value of Spearman's rho (0.548) is larger than the absolute value of Pearson's r (0.203), Spearman's rho indicates a stronger relationship. Click on the check box to mark the statement as correct.

Slide 13 The next statement asks us to identify the outliers in the distribution. If there are outliers, we can re- express one or both variables to see if the linear correlation between the variables increases. In these problems, outliers are defined as cases with scores that are three or more standard deviations from the mean. We use the SPSS Descriptives procedure to identify them.

Slide 14 To compute the standard scores, select the Descriptive Statistics > Descriptives command from the Analyze menu.

Slide 15 First, move the variable for the analysis hivlive and literacy to the Variable(s) list box. Second, click on the Options button to request skewness in case we decide to re-express the variable.

Slide 16 First, click the check box for Skewness so that we can decide which transformation to use should we decide to re- express the variable. Second, click on Continue button to return to the prior dialog box.

Slide 17 Finally, click on the OK button to produce the output. Next, mark the check box Save standardized values as variables.

Slide 18 If we scroll the Data View all the way to the right, we see that SPSS has created the standard scores. To name the standard score variables, SPSS prepends the letter “Z” to the variable name.

Slide 19 Click the right mouse button on the column header for Zhivlive, and select Sort Ascending from the pop-up menu. This will show any negative outliers at the top of the column.

Slide 20 After we scroll down past the cases with missing values, we do not see any negative values less than or equal to -3.0.

Slide 21 Click the right mouse button again on the column header for Zpop, and select Sort Descending from the pop-up menu. This will show any positive outliers at the top of the column. Click the right mouse button again on the column header for Zhivlive, and select Sort Descending from the pop-up menu. This will show any positive outliers at the top of the column.

Slide 22 At the top of the column, we see four positive outliers with values greater than or equal to +3.0.

Slide 23 If we scroll back to the country column, we see the names for the four outliers: Ethiopia, Kenya, Nigeria, and South Africa.

Slide 24 Scroll right to the Zliterarcy column and click the right mouse button on the column header for Zliterarcy, and select Sort Ascending from the pop- up menu. This will show any negative outliers at the top of the column.

Slide 25 After we scroll down past the cases with missing values, we see that we have one negative value less than or equal to -3.0.

Slide 26 Click the right mouse button again on the column header for Zliteracy, and select Sort Descending from the pop-up menu. This will show any positive outliers at the top of the column.

Slide 27 At the top of the column, we see that there are not any standard scores equal to or greater than +3.0.

Slide 28 If we select the row with the negative outlier and scroll back to the country column, we see that Niger was an outlier on the variable literacy.

Slide 29 The distribution of "number of people living with HIV- AIDS" [hivlive] contained four outliers that were three or more standard deviations from the mean: Ethiopia with a value of 3,000,000 (z=4.81), Kenya with a value of 2,100,000 (z=3.25), Nigeria with a value of 2,700,000 (z=4.29), and South Africa with a value of 4,200,000 (z=6.88). The distribution of "percent of the total population who was literate" [literacy] contained one outlier that was three or more standard deviations from the mean: Niger with a value of 13.6 (z=-3.02). We mark the check box for correct.

Slide 30 Since Spearman’s rho indicated a stronger relationship than Pearson’s r in distributions that had outliers, we can re-express the variables to see if the strength of the linear relationship increases.

Slide 31 In the table of Descriptive Statistics, we see that the skewness for number of people living with HIV-AIDS [hivlive] was Since the variable was positively skewed, the data will be re-expressed as logarithms, transforming it to the log transformation of number of people living with HIV-AIDS [LG_hivlive].

Slide 32 We type the formula identified in the second paragraph of the problem. Click on the OK button to produce the output.

Slide 33 The skewness for "percent of the total population who was literate" [literacy] was Since the variable was negatively skewed, the data will be re-expressed as squares. The independent variable percent of the total population who was literate [literacy] was transformed to the square transformation of percent of the total population who was literate [SQ_literacy].

Slide 34 We type the formula identified in the second paragraph of the problem. Click on the OK button to produce the output.

Slide 35 Next, we compute the correlation coefficients for the transformed variables. To compute correlations, select Correlate > Bivariate from the Correlate menu.

Slide 36 First, move the variables LG_hivLive and SQ_literacy to the Variables list box. Second, click on the OK button to produce the output.

Slide 37 The linear fit of the relationship improved, and we report the Pearson's r for the relationship using the transformed variables. Before re-expressing the data, Pearson's r was After re-expressing both the dependent variable number of people living with HIV-AIDS [hivlive] as the log transformation of number of people living with HIV-AIDS [LG_hivlive] and the independent variable percent of the total population who was literate [literacy] as the square transformation of percent of the total population who was literate [SQ_literacy], Pearson's r increased to

Slide 38 Since the strength of the linear relationship increased when we used the transformed variables, we mark the check box for the question.

Slide 39 The next statement asks us to interpret the Pearson correlation coefficient using the guidelines attributed to Tukey.

Slide 40 Using the rule of thumb attributed to Tukey, a correlation between 0.0 and ±0.20 is very weak; ±0.20 to ±0.40 is weak; ±0.40 to ±0.60 is moderate; ±0.60 to ±0.80 is strong; and greater than ±0.80 is very strong, the relationship between the square transformation of "percent of the total population who was literate" and the log transformation of "number of people living with HIV-AIDS"was correctly characterized as a moderate relationship (Pearson's r = -.487). Click on the check box to mark the statement as correct.

Slide 41 The next statement asks us to interpret the Pearson correlation coefficient using the guidelines attributed to Cohen.

Slide 42 Applying Cohen's criteria for effect size (less than ±0.10 = trivial; ±0.10 up to ±0.30 = weak or small; ±0.30 up to ±0.50 = moderate; ±0.50 or greater = strong or large), the relationship was correctly characterized as a moderate relationship (Pearson's r = -.487). Click on the check box to mark the statement as correct.

Slide 43 The next statement asks us to interpret the direction of the relationship between the variables. A direct or positive relationship means that the values of the variables change in the same direction, i.e. when one goes up or down, the other goes up or down. An inverse or negative relationship means that the values of the variables move in different directions.

Slide 44 Since the sign of the correlation coefficient was negative (Pearson's r = -.487), the relationship is inverse and the values for the variables move in opposite directions. The statement that “higher scores on the variable the square transformation of percent of the total population who was literate were associated with lower scores on the log transformation of number of people living with HIV-AIDS” is correct. Click on the check box to mark the statement as correct.