Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D. 1.

Slides:



Advertisements
Similar presentations
ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Advertisements

CHOOSING A STATISTICAL TEST © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Chapter Eighteen MEASURES OF ASSOCIATION
Chapter 19 Data Analysis Overview
Lecture 23 Multiple Regression (Sections )
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
1 Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D.
Nonparametrics and goodness of fit Petter Mostad
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Nonparametric or Distribution-free Tests
Inferential Statistics
Choosing Statistical Procedures
AM Recitation 2/10/11.
Hypothesis Testing:.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
EDLD 6392 Advanced Topics in Statistical Reasoning Texas A&M University-Kingsville Research Designs and Statistical Procedures.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Academic Research Academic Research Dr Kishor Bhanushali M
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
ANALYSIS PLAN: STATISTICAL PROCEDURES
Lecture 10: Correlation and Regression Model.
Angela Hebel Department of Natural Sciences
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Nonparametric Statistics
Hypothesis Testing Procedures Many More Tests Exist!
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
PHL Test of Significance: General Purpose The idea of significance testing. If we have a basic knowledge of the underlying distribution of a variable,
Non – Parametric Test Dr. Anshul Singh Thapa.
What are their purposes? What kinds?
Presentation transcript:

Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D. 1

Topics to be covered The Normal Distribution Parametric vs nonparametric statistics Correlation Correlational vs experimental research Analysis of variance Regression analysis Factor analysis 2

Check this out! Electronic Statistics Textbook Electronic Statistics Textbook 3 Much of the content of this lecture is drawn from this source. StatSoft, Inc. (2008). Electronic Statistics Textbook. Tulsa, OK: StatSoft.

4 The Normal Distribution

Parametric Versus Nonparametric Statisti cs Parametric statistical tests require that: we have a basic knowledge of the underlying distribution of a variable, then we can make predictions about how, in repeated samples of equal size, a particular statistic will "behave," that is, how it is distributed. 5

Parametric Versus Nonparametric Statistics For example, if we draw 100 random samples of 100 adults each from the general population, and compute the mean height in each sample, then the distribution of the standardized means across samples will likely approximate the normal distribution (to be precise, Student's t distribution with 99 degrees of freedom). Now imagine that we take an additional sample in a particular city where we suspect that people are taller than the average population. If the mean height in that sample falls outside the upper 95% tail area of the t distribution then we conclude that, indeed, the people of this city are taller than the average population. 6

Parametric Versus Nonparametric Statistics Are most variables normally distributed? In the example just given we relied on our knowledge that, in repeated samples of equal size, the standardized means (for height) will be distributed following the t distribution (with a particular mean and variance). However, this will only be true if in the population the variable of interest (height in our example) is normally distributed, that is, if the distribution of people of particular heights follows the normal distribution (the bell-shape distribution). 7

Parametric Versus Nonparametric Statistics For many variables of interest, we simply do not know for sure that this is the case. For example, is income distributed normally in the population? -- probably not. The incidence rates of rare diseases are not normally distributed in the population, the number of car accidents is also not normally distributed, and neither are very many other variables in which a researcher might be interested. 8

Parametric Versus Nonparametric Statistics The Issue of Sample Size Another factor that often limits the applicability of tests based on the assumption that the sampling distribution is normal is the size of the sample of data available for the analysis (sample size; n). We can assume that the sampling distribution is normal even if we are not sure that the distribution of the variable in the population is normal, as long as our sample is large enough (e.g., 100 or more observations). 9

Parametric Versus Nonparametric Statistics The Issue of Sample Size (continued) However, if the sample is very small, then those tests can be used only if we are sure that the variable is normally distributed, and there is no way to test this assumption if the sample is small. 10

Parametric Versus Nonparametric Statistics Problems in Measurement Applications of tests that are based on the normality assumptions are further limited by a lack of precise measurement. For example, let us consider a study where grade point average (GPA) is measured as the major variable of interest. Is an A average twice as good as a C average? Is the difference between a B and an A average comparable to the difference between a D and a C average? Somehow, the GPA is a crude measure of scholastic accomplishments that only allows us to establish a rank ordering of students from "good" students to "poor" students. 11

Parametric Versus Nonparametric Statistics Problems in Measurement (continued) Most common statistical techniques such as analysis of variance (and t- tests), regression, etc. assume that the underlying measurements are at least of interval, meaning that equally spaced intervals on the scale can be compared in a meaningful manner (e.g, B minus A is equal to D minus C). However, as in our example, this assumption is very often not tenable, and the data rather represent a rank ordering of observations (ordinal) rather than precise measurements. intervalordinalintervalordinal 12

Parametric Versus Nonparametric Statistics The need is evident for statistical procedures that allow us to process data of "low quality," from small samples, on variables about which nothing is known (concerning their distribution). 13

Parametric Versus Nonparametric Statistics Specifically, nonparametric methods were developed to be used in cases when the researcher knows nothing about the parameters of the variable of interest in the population (hence the name nonparametric). In more technical terms, nonparametric methods do not rely on the estimation of parameters (such as the mean or the standard deviation) describing the distribution of the variable of interest in the population. 14

Type Question Parametric tests Nonparametric tests Differences between independent groups t-test for independent samples t-test for independent samples analysis of variance analysis of variance Wald-Wolfowitz runs test Wald-Wolfowitz runs test Mann-Whitney U test Mann-Whitney U test Kolmogorov-Smirnov two-sample test Kolmogorov-Smirnov two-sample test Kruskal-Wallis analysis of ranks Kruskal-Wallis analysis of ranks Median test Median test Differences between dependent groups t-test for dependent samples t-test for dependent samples Sign test Sign test Wilcoxon's matched pairs test Wilcoxon's matched pairs test Relationships between variables correlation coefficient correlation coefficient Spearman R Spearman R Kendall Tau Kendall Tau Coefficient Gamma Coefficient Gamma Chi-square Chi-square 15

Correlation Correlation is a measure of the relation between two or more variables. The measurement scales used should be at least interval scales, but other correlation coefficients are available to handle other types of data. interval scalesinterval scales Correlation coefficients can range from to value of represents a perfect negative correlation negative correlationnegative correlation value of represents a perfect positive correlation. positive correlationpositive correlation value of 0.00 represents a lack of correlation. 16

17

Correlational Versus Experimental Research In correlational research we do not (or at least try not to) influence any variables but… only measure them and look for relationships (correlations) between some set of variables, such as blood pressure and cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables; for example, a researcher might artificially increase blood pressure and then record cholesterol level. 18

Correlational Versus Experimental Research Data analysis in experimental research comes down to calculating "correlations" between variables, specifically, those manipulated and those affected by the manipulation. However, experimental data may potentially provide qualitatively better information: only experimental data can demonstrate causal relations between variables. 19

Correlational Versus Experimental Research Example: If we found that whenever we change variable A then variable B changes, then we can conclude that "A influences B." Data from correlational research can only be "interpreted" in causal terms based on some theories that we have, but correlational data cannot conclusively prove causality. 20

Analysis of Variance (ANOVA) Purpose of is to test for significant differences between means. If we are only comparing two means, then ANOVA will give the same results as the… t test for independent samplest test for independent samples (if we are comparing two different groups of cases or observations), or the… t test for independent samples t test for dependent samplest test for dependent samples (if we are comparing two variables in one set of cases or observations). t test for dependent samples 21

Why Analysis of Variance? It may seem odd to you that a procedure that compares means is called analysis of variance. Name is derived from the fact that in order to test for statistical significance between means, we are actually comparing variances. 22

Regression Analysis Purpose is to learn more about the relationship between… several independent or predictor variables and… a dependent or criterion variable. Widely used in social sciences Allows the researcher to ask the general question "what is the best predictor of...” 23

Regression Analysis Examples: educational researchers might want to learn what are the best predictors of success in high-school. psychologists may want to determine which personality variable best predicts social adjustment. sociologists may want to find out which of the multiple social indicators best predict whether or not a new immigrant group will adapt and be absorbed into society bail “failure to appear” example 24

Regression Analysis Example A real estate agent might record for each listing the (1) size of the house (in square feet), (2) # of bedrooms, (3) average income in the neighborhood, (4) a subjective rating of appeal of the house. Use regression to see whether and how these measures relate to the price for which a house is sold. Maybe #of bedrooms is a better predictor of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). One may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics. Example is of a “hedonic price index.” 25

Factor Analysis The main applications of factor analytic techniques are: to reduce the number of variables and to detect structure in the relationships between variables, that is to classify variables. factor analysis is a data reduction or structure detection method. 26