URBDP 591 A Lecture 9: Cross-Sectional and Longitudinal Design Objectives Experimental vs. Non-Experimental Design Cross-Sectional Designs Longitudinal.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Chapter 13. Hypothesis Testing Decision-making process Statistics used as a tool to assist with decision-making Scientific hypothesis.
Advertisements

Multiple Regression Analysis
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Correlation and regression
Correlation Oh yeah!.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Correlation and Linear Regression.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Correlation and regression Dr. Ghada Abo-Zaid
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Correlation and Linear Regression
Multiple Regression [ Cross-Sectional Data ]
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Business Statistics - QBM117 Statistical inference for regression.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Multiple Regression Research Methods and Statistics.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Correlation & Regression
Lecture 16 Correlation and Coefficient of Correlation
Objectives of Multiple Regression
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Chapter 13: Inference in Regression
Correlation and Correlational Research Slides Prepared by Alison L. O’Malley Passer Chapter 5.
Simple Linear Regression
Chapter 15 Correlation and Regression
Understanding Statistics
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Review of Research Methods. Overview of the Research Process I. Develop a research question II. Develop a hypothesis III. Choose a research design IV.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Correlation & Regression Analysis
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Methods of Presenting and Interpreting Information Class 9.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression.
CHAPTER 29: Multiple Regression*
Multiple Regression.
Simple Linear Regression
Simple Linear Regression
Product moment correlation
Regression Part II.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

URBDP 591 A Lecture 9: Cross-Sectional and Longitudinal Design Objectives Experimental vs. Non-Experimental Design Cross-Sectional Designs Longitudinal Design Multiple Regression

Research Designs/Approaches TypePurposeTime frameDegree of control Examples Experi- mental Test for cause/ effect relationships currentHighComparing two types of treatments on plant growth Quasi- experi- mental Test for cause/ effect relationships without full control Current or past Moderate to high Comparing effect of curriculum on children ability to read

Research Designs/Approaches TypePurposeTime frameDegree of control Examples Non- experimen tal - corre- lational Examine relationship between two variables Current (cross- sectional) or past Low to medium Relationship between patterns of urban development on bird diversity Ex post facto Examine the effect of past event on current functioning. Past & current Low to medium Relationship between change in population density and bird diversity

Research Designs/Approaches TypePurposeTime frameDegree of control Examples Non- experimental -longitudinal Repeated measurement s of the same subject over time Future - predictive Low to moderate Relat. betw. Urban development and stream quality Cohort- sequential Examine change in a var. over time in overlapping groups. FutureLow to moderate Relationship between urban development and stream quality across various types of basins

Research Designs/Approaches TypePurposeTime frameDegree of control Examples SurveyAssess opinions or characteristics that exist at a given time. CurrentNone or low People preferences for different landscapes. Quali- tative Discover potential relationships; descriptive. Past or current None or Low People’s experiences of driving through a park.

experimental research determines whether one variable causes changes in another variable correlational research measures the relationship between two variables difference: variables can be related without being causally related Experimental vs Correlational Research

Main interest is to determine whether 2 variables co-vary and to determine direction of relationship. Characteristics of Correlational research. - Differs from experimental research: 1. No manipulation of IV's 2. Subjects not randomly assigned. - Measure 2 variables and determine whether correlational relationship exists between them. - If correlational relationship exists between 2 variables, can predict value of one variable from value of another Correlational Research

Correlational Studies Type of descriptive research design –Advantage is that it can examine variables that cannot be experimentally manipulated (e.g.,population growth). –Disadvantage is that it cannot determine causality. –Third variable may account for the association. –Directionality unclear

Non-experimental Research Designs Describes a particular situation or phenomenon. Hypothesis generating Can describe effect of implementing actions based on experimental research and help refine the implementation of these actions.

Cross-Sectional Study Designs Compares groups at one point in time –e.g., landscape patterns. Advantage is that it is an efficient way to identify possible group differences because you can study them at one point in time.  Disadvantage is that you cannot rule out cohort effects.

Longitudinal method--measurement of the same subjects over time. Cross-sectional method--measurement of several groups at a single point in time. Sequential methods--methods that combine the cross-sectional and longitudinal methods Non-Experimental Research Design

Longitudinal Design Gathers data on a factor (e.g. bird diversity) over time. Advantage is that you can see the time course of the development or change in the variables –Bird diversity decreasing with urbanization. –Bird diversity decreasing at a faster rate within the UGB.  Disadvantage is it is costly and still subject to bias

Cohort-Sequential Design Combines a bit of the cross-sectional design and longitudinal design –E.g., Different bird species are compared on a variable over time. Advantage – very efficient and reduces some of the biases in the cross-sectional design since you can see the evolution of change over time.  Disadvantage – cannot rule out cohort bias or the problem of the ‘unidentified’ third variable accounting for the change.

correlation refers to a meaningful relationship between two variables; values of both variables change together somehow positive correlation: high score on first variable associated with high score on second variable negative correlation: high score on first variable associated with low score on second variable no correlation: score on first variable not associated with score on second variable Correlational Research

Correlation Coefficient: Correlation tells us about the strength (and shape) of the relationship between two variables. The square of the correlation tells us the proportion of the variables' variance that is shared between them. Simple Regression: Regression tells us about the nature of the function relating the two variables. For linear regression, which is what we consider here, regression fits a straight line, called the regression line, to the data so that the line best describes their relationship. Multiple Regression Multiple regression tells us about the nature of the function that relates a set of predictor variables to a single response variable. OLS (ordinary least squares) multiple regression assumes the function is a linear function. Correlation vs. Regression

Covariance When there is a relation between two variables they covary. The Pearson correlation coefficient is a unit-free measure of the degree of covariance.

Covariance Now consider a third variable: A and B do not covary but C covary with both A and B A, B and C all covary None covary. They are orthogonal. The r 2 is the amount of shared variation between the variables.

scatterplots are used to provide a descriptive analysis of correlation – evaluate degree of relationship – assess linearity of relationship Pearson’s r measures correlations between two interval/ratio level variables – magnitude measured from 0.0 to 1.0 – direction indicated by + or - – statistical significance of correlation provided by p value Spearman’s rho measures correlations between two ordinal level variables Measuring Correlations

correlation is not causation directionality problem third-variable problem partial correlation Interpreting Correlations

regression allows prediction of a new observation based on what is known about correlation regression involves drawing a line that best describes a correlation Y = a + bX + e X is predictor variable; Y is criterion variable Regression Analysis

The Multiple Regression Model A multiple regression equation expresses a linear relationship between a dependent variable Y and two or more independent variables (X 1, X 2, …, X k ) Y = α + β 1 X 1 + β 2 X 2 + … + β k X k + ε b is called a partial regression coefficient. For example, b 1 denotes the coefficient of Y on variable X1 that one would expect if all the other X variables in the equation were held constant.

Meaning of parameter estimates –Slope Change in Y per unit change in X. Marginal contribution of X to Y holding all other variables in the regression constant. –Intercept Meaningful only if X=0 is in the sample range. Otherwise, merely extrapolation of linear approximation.

Expresses the amount of variance on criterion explained by predictor or set of predictors R 2 increment - indicates the increase in the total variance on the criterion accounted for by each new predictor added to the regression model 2 tests of significance are typically computed: i) is R different from 0, ii) is R 2 increment statistically significant Coefficient of determination - R 2

Regression Equation for a Linear Relationship A linear relationship of n predictor variables, denoted X1, X2,... Xn to a single response variable, denoted Y is described by the linear equation involving several variables. The general linear equation is: Y = a + b1X1 + b2X bnXn This equation shows that any linear relationship can be described by its: Intercept: The linear combination of the X's is zero. Slopes: The slope specifies how much the variable Y will change when the particular X changes by one unit.

1. The independent variable should be accurately measured with negligible error. 2.The values of the dependent variable are normally distributed 3.Variation in the dependent variable (ie the spread around the line) is constant over values of the independent variable. This is known as homoscedasticity. 4.The values of residuals (the difference between the predicted and the expected values) have a normal distribution – that is, there are relatively few extremely small or extremely large residuals). 5.The values of the residuals are independent from each other – ie., they are randomly distributed along the regression line (there is no systematic pattern). Regression Assumptions

Multiregression problems  Outliers. As with SLR, a single outlying point can greatly distort the results of MLR, but it is more difficult to detect outliers visually.  Too few subject. A general rule of thumb is that you need at least data points for each X variable, otherwise it is too easy to be misled with spurious results.  Inappropriate model. Although complicated, MLR is too simple for some data. MLR assumes that each X variable contributes independently towards the value of Y, but often X variables contribute to Y by an interaction with each other.  Unfocussed studies. If you give the computer enough variables, some significant relationships are bound to turn up by chance, and these may mean nothing.

Criteria for Developing a MLR Model The overriding criterion is that any potential set of predictors must be scientifically defensible. It is not good science nor proper use of statistics to put predictors in a model just because the data were available of to see “what happens”. Other criteria: - A statistically significant overall model - A large R 2. The model explains a large amount of variation in Y. - A small standard error (SQRT (MSE)) of the model. Is the regression precise enough so that findings have practical utility? - Significant partial t tests. Does each X variable explain significant additional variation in Y given the other predictors in the model? - Choose the smallest number of predictors to adequately characterize the variation in Y.

The model we can think of as having given rise to the observations is usually too complex to be described in every detail from the information available. We have to rely on more simple models; approximations Question: What’s sufficient? Model Selection and Model Adequacy The approximation should be sufficient for our purposes! Note: ”More realistic” models might be more close to ”the true model”. However, we are NOT aiming at finding the true model! We are trying to find THE BEST APPROXIMATING MODEL.

How to select best model Trade-off between Bias and Variance when considering model complexity (number of parameters) Variance Bias Number of Parameters ”Best Model”

Model Selection: The Likelihood Ratio Test Basic idea: Add parameters only if they provide a significant improvement in the fit of the model to the data ”delta” likelihood under Model 1 likelihood under Model 2

Akaike Information Criterion (AIC) An approximation of the Kullback–Liebler discrepancy AIC = –2lnL +2N L= Likelihood N= Number of parameters Choose the model with the smallest AIC AIC penalizes the model for additional parameters Other alternatives for ranking models

Bayesian Information Criterion (BIC) An approximation of the log of the Bayes Factor BIC = –2lnL + N ln n L= Likelihood N= Number of parameters n = number of characters Choose the model with the smallest BIC For larger data, BIC should tend to choose simpler models than AIC (since the natural log of n is usually > 2) Other alternatives for ranking models