GS/PPAL Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS II
Agenda Correlations Correlation Coefficient: a quantitative measure of linear correlations Correlation Strength versus Statistical Significance Simple Regression Analyses Quantitative Research Project – Recap Course Conclusion
Correlations Is CGPA related in some way to total hours studied (H)? Statistically, is the mean value of CGPA varying in some way with H? Remember, we need to account for the fact that they each tend to deviate from their true mean randomly. The “correlation coefficient” for a set of observations is a function of how much each of the observed values deviate from the sample means adjusted for (i.e., not explained by) random deviation
Correlation Images "Correlation examples2" by Denis Boigelot, original uploader was Imagecreator - Own work, original uploader was Imagecreator. Licensed under CC0 via Wikimedia Commons -
Representing Linear Correlation 1.For a population, the typical notation is: ρ (H,C) = corr(H,C) = cov (H,C)/σ H σ C = 1/(n-1) * Σ [(H-μ H )(C- μ C )]/ σ H σ C 2.For a sample from that same population (changing the notation only): r (H, C) = 1/(n-1) * Σ [(H-avgH)(C- avgC)]/ s H s C Excel program to calculate (2) above: = CORREL (data array (H), data array (CGPA)), OR = PEARSON (data array (H), data array (CGPA))
Population Correlation Coefficient The Pearson correlation coefficient (numbers above images) measures only the linear relationship between two variables
Correlation Coefficient (= 0.816) versus Visual Inspection of Data "Anscombe's quartet 3" by Anscombe.svg: Schutzderivative work (label using subscripts): Avenue (talk) - Anscombe.svg. Licensed under CC BY-SA 3.0 via Wikimedia Commons - artet_3.svg
Correlations and Predictions Presence of a (linear) correlation may offer predictive information that may be useful It may (but may not) suggest causality to be examined further - “correlation does not imply causation” (when there is no control group) It may suggest policy considerations (policy action, spillover effects, consequences)
10-case Study Raw DataScatter Plot with Linear Trend CaseCGPA Total Hours Studied
Correlation for 10-case Study = CORREL (CGPA, HOURS) = PEARSON (CGPA, HOURS) = R-squared = * = 0.63 If CGPA is a linear function of HOURS and CGPA is normally distributed, then R-squared gives the “explained variance” or 63% if the variation in CGPA can be “explained” by variation in HOURS
Strength versus Significance A “strong” correlation may or may not be significant A “weak” correlation may or may not be significant Key is the size of the sample – for small samples a strong correlation may still be by chance; for large samples it is easy to achieve significance for weak correlations
T-test for Significance Null Hypothesis: Ho: r = 0 Alternative Hypothesis: Ha: r ≠ 0 (i.e., there is a positive or negative correlation that is significant) Correlation coefficient ( r) Adjust by weighting (dividing) r by its standard error = se(r) = [(1-r 2 )/(n-2)] 1/2 T-stat* = r/se(r) Compare t-stat* to critical t-value for (n-2) degrees of freedom and chosen significance level
10-case Study Correlation coefficient ( r) for our study = 0.79 se(r) = [1-0.63/(10-2)] 1/2 = [0.046] 1/2 = T-stat = r/se(r) = 0.79/0.214 = 3.69 For 8 df, two-tailed 95% Confidence, critical t-value = T.INV.2T(.05,8) = > the correlation would NOT occur by chance 95% of the time, therefore reject null hypothesis conclude that hours studied is (positively) correlated with CGPA
Representing Linear Relationships Since CGPA and HOURS appear to be strongly positively correlated (but it may only be an artifact of the small sample size) and statistically significant (despite being a small sample) then examine relationship more closely General linear relationship: Y = mX + b for Y dependent variable, X independent or explanatory variable, and b some constant
Graphically Locate coordinates (2, 4) that is, X = 2, Y = 4 Locate coordinates (3, 5) When X increases by +1 (from 2 to 3) how much does Y increase by? (=m) When X = 0, what does Y equal? (= b) Therefore model is Y = 1*X + 2
CGPA and HOURS For the linear trend line, CGPA = Intercept (b) + coefficient (m) * HOURS CGPA = *HOURS For every +1 hour studied per month, by how much does CGPA increase? How did we obtain the linear trend line?
Regression Analysis - Intuition The estimated linear trend line specifies the linear relationship that “best fits” the data A “best fit” model is one that minimizes the amount an observation deviates from the hypothesized model “Best fit” here means to minimize the sum of the squared deviations between the data points and the linear trend line (model) “Linear Least Squares Regression Model”
Regression Analysis - Mechanics In Excel: “Data Analysis” “Regression” Coefficients: values of “b” (intercept) and “m” coefficient on explanatory variable Standard Error, t-stat, P-value and CI (95%) for each estimate
Data Interpretation (again) From the Regression Output we know: CGPA = *HOURS For every +1 hour studied, CGPA on graduation increases by Graduating students with +1 grade point higher than other graduating students, studied on average more hours per month (9.52 = 1 / 0.105)
P-Value Approach to Statistical Significance of Total Hours Studied H 0 : coefficient on HOURS = 0; H A : ≠ 0 P-value approach: P-value = <.05 or the probability this coefficient is obtained purely by chance is less than 5% reject H 0 data support H A Note: for a 1-sided test (e.g., coefficient > 0) divide reported P-value by 2
Critical Value Approach to Statistical Significance of Total Hours Studied H 0 : coefficient on HOURS = 0; H A : ≠ 0 Critical value approach: critical value = T.INV.2T (0.05, 9) = t-stat = > reject H 0
A Quantitative Research Project: Recapitulation Research Topic: Academic Performance Research Questions: How well do graduating students perform academically? What explains that performance? Measure “academic performance” by graduating CGPA Research Design: Cross-sectional analysis of graduating students in a given year Data Collection: Survey (a random sample of 10) students graduating in 2014 Data Description: Describe the data with basic statistics Data Analysis: Reasons for attending university and performance; Total hours studied and CGPA
Research in Public Policy Excerpted from Morçöl and Ivanova (2010) Categories of MethodsQuantitative OrientationQualitative Orientation Empirical Inquiry - Design Methods Experimental, Cross- sectional, Longitudinal Case study Empirical Inquiry - Data Collection Methods Surveys, Secondary DataQualitative (long, in-depth, or semi-structured) Interviews Empirical Inquiry - Data Analysis Methods Statistical, Regression, or Time-series Analyses (Computer-assisted) Qualitative Data Analyses Empirical Inquiry - Combined Methods Game Theory, Simulations, Systems Analysis, Meta- Analyses, Network Analyses Case study, Legal Analyses, Archival, Ethnography, Grounded Theory, Textual Analyses Methods of Decision Making and Planning Cost-benefit, Decision Analyses, Linear Programming Brainstorming, Delphi
Quality of Quantitative (Qualitative) Research: Reliability, Relevance, Validity Reliability: can we replicate the research results? (are the results dependable?) Relevance: are results of practical significance? (are results trustworthy or authentic?) Construct Validity: do quantities observed reflect research variables of interest? (is there objectivity?) Internal Validity: is there a causal relationship between the independent and dependent variables? (is there credibility?) External Validity: can we generalize beyond the one study? (are results transferable?)
Achieving Learning Outcomes Basic user familiarity requires familiarity with – research ethics – existing data sets – the collection of qualitative and quantitative data – data measurement – sampling – advantages and disadvantages of different research methods – descriptive and inferential statistics
Learning Outcomes? Understand key concepts in research Apply critical analytical skills to published research Understand the application, value and limits of quantitative and qualitative research methodologies and techniques / tools Develop skills in devising and designing research methods suitable for different policy contexts and for rigorous analysis Provide a grounding in ethical issues related to: – academic research – the role of the public servant as a custodian of data and information balancing the public’s right to know against the personal data and information which an individual citizen has a right to be kept confidential
Good Luck! And THANK YOU… …for the journey, …for your patience, …your curiosity, …your humour!