1 Data Analysis I Statistics for Description Gail Johnson January 2008.

Slides:



Advertisements
Similar presentations
POL242 October 9 and 11, 2012 Jennifer Hove. Questions of Causality Recall: Most causal thinking in social sciences is probabilistic, not deterministic:
Advertisements

Richard M. Jacobs, OSA, Ph.D.
Learning Objectives In this chapter you will learn about measures of central tendency measures of central tendency levels of measurement levels of measurement.
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Data Analysis: Relationships Continued Regression
SPSS Review CENTRAL TENDENCY & DISPERSION
Organizing Data Proportions, Percentages, Rates, and rates of change.
Bivariate Analysis Cross-tabulation and chi-square.
EBI Statistics 101.
MSS 905 Methods of Missiological Research
Inference1 Data Analysis Inferential Statistics Research Methods Gail Johnson.
Chapter 12: Analysis of Quantitative Data Introduction Dealing with Data: Coding, Entering, and Cleaning Descriptive Statistics –One Variable –Two Variables.
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
Statistics: An Introduction Alan Monroe: Chapter 6.
Session 7.1 Bivariate Data Analysis
PPA 415 – Research Methods in Public Administration Lecture 9 – Bivariate Association.
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Data Transformation Data conversion Changing the original form of the data to a new format More appropriate data analysis New.
Social Research Methods
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Measures of Central Tendency
Week 11 Chapter 12 – Association between variables measured at the nominal level.
Today: Central Tendency & Dispersion
Chapter 15 – Elaborating Bivariate Tables
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Hypothesis Testing for Ordinal & Categorical Data EPSY 5245 Michael C. Rodriguez.
Dr. G. Johnson, Data Analysis for Description Research Methods for Public Administrators Dr. Gail Johnson.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Introduction to Descriptive Statistics Objectives: Determine the general purpose of correlational statistics in assessment & evaluation “Data have a story.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Research Methods Chapter 8 Data Analysis. Two Types of Statistics Descriptive –Allows you to describe relationships between variables Inferential –Allows.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Dr. G. Johnson, Exploring Relationships: Measures of Association Research Methods for Public Administrators Dr. Gail Johnson.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Descriptive Statistics
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Measures of Central Tendency: The Mean, Median, and Mode
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
SOC 3155 SPSS Review CENTRAL TENDENCY & DISPERSION.
Central Tendency & Dispersion
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
L643: Evaluation of Information Systems Week 13: March, 2008.
Chapter Eight: Using Statistics to Answer Questions.
I. Introduction to Data and Statistics A. Basic terms and concepts Data set - variable - observation - data value.
Statistical Analysis Quantitative research is first and foremost a logical rather than a mathematical (i.e., statistical) operation Statistics represent.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
Statistics. Descriptive Statistics Organize & summarize data (ex: central tendency & variability.
Copyright © 2014 by Nelson Education Limited Chapter 11 Introduction to Bivariate Association and Measures of Association for Variables Measured.
PS 366. Levels of Measurement How we classify / observe things Affects how they are described Affects what statistics we use to test hypotheses about.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Copyright © 2012 by Nelson Education Limited. Chapter 12 Association Between Variables Measured at the Ordinal Level 12-1.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Final Project Reminder
Final Project Reminder
Bi-variate #1 Cross-Tabulation
Social Research Methods
Chapter 15: Correlation.
Summarising and presenting data - Bivariate analysis
Presentation transcript:

1 Data Analysis I Statistics for Description Gail Johnson January 2008

2 Statistics Over-Rated? While statistics is often used in public administration research, research on even the most complex policy areas might use very little in the way of numbers GAO Testimony: Securing, Stabilizing and Rebuilding Iraq –What type of question? –What statistics? –What were the likely arguments about this report?

3 Statistics "Many a statistic is false on its face. It gets by only because the magic of numbers brings out a suspension of common sense." (Huff, p. 138). State of the state Descriptive vs Inferential

4 Descriptive Statistics Frequency Distributions –Number and percents of a single variable Parts of a Whole –Percents (75%)and Proportions (.75) Ratio: numbers presented in relationship to each other –Student to teacher ratio: 15:1

5 Simple Data Handout: Budget Surplus and Deficit –What does this analysis tell you? –What questions come to mind?

6 Analysis Rates: number of occurrences that are standardized –Deaths of infants per 100,000 births –Crime rates: crimes per 100,000 population Teen birth rates –2006: Handout –Controversy: Public policy choices

7 Analysis Rates of change –Percentage change from one time period to the other The budget increased 23% from FY 2002 to FY Workbook: Table 9.6

8 Frequency Distributions How many men and women are in the program? Distribution of Respondents by Gender: Male FemaleTotal Number Percent Number Percent Number % %300

9 Frequency Distributions How many men and women are in the program? Write-up: Of the 300 people in this program, 67% are women and 33% are men.

10 Percent Distributions Survey Data Workbook: Table 9.2 What is the story? What would you conclude from this data? Why is it important to use a 5-point scale rather than just ask Satisfied or Dissatisfied?

11 Telling the Story Exercise: Workbook: Tables 9.3 and 9.4

12 Measures of Central Tendency Central tendency: How similar are the characteristics? –Example: how similar are the ages of this group of people? The 3-Ms: Mode, Median, Mode. Mode: most frequent response. Median: mid-point of the distribution Mean: arithmetic average.

13 Levels of Data Depends on the Type of Data You Have: Nominal Data:mode Ordinal Data:mode and median Interval/Ratio:mode, median and mean

14 Central Tendency Interval/Ratio Data Normal Curve –Bell Shaped –Mean, Median and Mode should align

15 Skewed Data Skewed Data: –Positive –Negative –Tests: 0+ normal »+1/-1 Skewed use mean if distribution is normal use median if distribution is not normal

16 What to Use? Workbook: Table 9.8 Which would be the best single description of the central tendencies of this distribution? Why?

17 Trade-offs You have more options when you collect interval/ratio level data –But sometimes it is better from a data collection perspective to ask ordinal data: income categories, age categories Open for debate: Can you use “means” with ordinal data?

18 No Debate with Nominal Data But means should never be used for nominal level variables Gender: coded 1 = men, 2 = women The computer will do a mean if you ask it to. It gives you 1.6. What does that mean? Nothing

19 Measure of Dispersion: Standard Deviation Dispersion: How dissimilar are the characteristics? –Example: how much variation in the ages of this group of people? Measures the distance from the mean –Small standard deviation: not much dispersion –Large standard deviation: lots of dispersion

20 Standard Deviation Normal Distribution: Bell-shaped curve –68% of the variation is within 1 standard deviation of the mean –95% of the variation is within 2 standard deviations of the mean

21 Standard Deviation Range Mean 74 Median 71 Mode 68 Standard deviation: small Range Mean 69 Median 71 Mode 68 Standard deviation: large A few very low scores: skew data

22 Applying the Standard Deviation Average test score= 60. The standard deviation is 10. Therefore, 95% of the scores are between 40 and 80. Calculation: 60+20= =40.

23 Working With 2 Variables Descriptive Statistics Cross-tabs Comparison of Means

24 Crosstabs: Description Describe Respondents’ Race and Gender RaceGender: MenGender: Women Number PercentNumber Percent White50 21% 72 31% Black35 15% 25 11% Hispanic15 6% 14 6% Other18 8% 4 0% Workbook 9.10

25 Description Analysis: 21% are white men and 31% are white women 15% are black men and 11% are black women. White women are the largest group.

26 Describing Two Variables Workbook, Table 9.11 Participation in Classes by Gender In each class, what % are boys and what % are girls?

27 Description Polling Data Who did men and women vote for in New Hampshire Primary (exit polls) Handout:

28 Relationships The most interesting analysis often involves looking at relationships. Associations, Explanations, Causal, Correlated

29 Relationships Cause-Effect Questions –Time order –Co-variation –Logical Theory –Elimination of Rival Explanations

30 Independent and Dependent Variables Independent: –Variable which occurred first and you believe explains a change in the dependent variable –Program evaluation: the program Dependent: –Variable you want to explain –Program evaluation: the outcome measure

31 Identify IV and DV As age increases, sickness increases There are more accident fatalities in areas of low population density than in areas of high population density There are differences in salary based on gender States with severe penalties for DUI had fewer accident fatalities than states with weak penalties. More guns means reduces murder rates

32 Crosstabs to Explore Relationships Used when working with nominal and ordinal data Can be used with interval/ratio data that has been categorized into ordinal data.

33 Setting up the Analysis Are boys more likely to take hands-on classes than girls? We are testing whether there is a difference based on gender. The independent variable is gender (boys and girls). The dependent variable is the two types of classes.

34 Setting up the Analysis The analysis looks at the different categories of the independent variable (boys and girls) and compares the percent in the two types of classes. The percent distribution across the dependent variable always totals 100%. –The percent of boys in each the two classes. –The percent of girls in each of the two classes.

35 Crosstabs Hands-onTraditionalClasses Boys45%55% 100% Girls35%65% 100% Note: this is percentaged differently from Table 9.11

36 Crosstabs Interpretation: Boys are somewhat more likely (45%) to take the hands-on classes as compared to girls (35%).

37 Is There a Relationship Between Income and Job Satisfaction? Job Satisfaction Income LowMediumHigh Low50%20%13% Medium High Total100% n= % n= % n=75

38 Format Independent Variable: Income Dependent variable: Job Satisfaction Are people who earn high salaries more satisfied than those who earn low salaries? For each category of the independent variable, what is the percent distributions across the dependent variable? Independent variable: Income –Percent distribution Down the Column

39 IncomeJob SatisfactionTotal LowMediumHigh Low 50% % n=200 Medium 20% % n=150 High 13% % n=75 Same data, different format

40 Format Independent Variable: Income Dependent variable: Job Satisfaction Are people who earn high salaries more satisfied than those who earn low salaries? For each category of the independent variable, what is the percent distributions across the dependent variable? Independent variable: Income –Percent distribution ACROSS row

41 Education Level  Test Scores Performance test score High School or less More than High school Low40%20% High60%80% Total100% (n=550) 100% (n=1,000)

42 Format Independent Variable: Level of Education Dependent variable: Test Score If I know the level of education, can I predict the test score? For each category of the independent variable, what is the percent distributions across the dependent variable? Independent variable: Education –Percent distribution down column

43 Percentage is Key Workbook: Table 10.2 and Table 10.3:

44 Relationships: Statistical Controls Relationship remains unchanged when controls are introduced: suggests that ind. Var is associated or related to dep. Var. Relationship disappears. Suggests that initial relationship is spurious; the control variable is related to the dep. Var.

45 Relationships: Statistical Controls Relationship changes, dependent upon the control variable: interaction. A relationship between three variables is interactive. Relationship between ind. and. dep. variable is attenuated but persists. IV and CV are related to the DV.

46 Control Variables Statistical controls: to eliminate possible rival explanations. Are there differences in Satisfaction with City Services based on race? Workbook: 104: –What would you conclude?

47 Control Variables Is there anything else that might explain those differences? Maybe it is not race—maybe it is really about whether you live in poor or non-poor neighborhoods. So, we can run the same cross-tab between race and satisfaction, but this time controlling for Neighborhood Wealth (Poor and Not-Poor)

48 Control Variables This gives us two cross-tab tables Workbook Table 10.5 –We see the relationship between race and satisfaction for those in poor neighborhoods –We see the relationship between race and satisfaction for those in non-poor relationships –When the relationship disappears as dramatically as this, we will conclude that race has not a factor –Technical: the initial relationship is spurious

49 One more Example Workbook Table 10.6 Views on Social Welfare Bills based on Location IV: Location (Urban/Rural) DV: Position (Support/Oppose) Of the people who live in Urban areas, what % support? Of the people who live in Rural areas, what % support?

50 One More Example Now, what happens when we control for Political Party? Look at Table 10.7 What do you conclude? Of course, I am using fake data. It is unlikely to find anything so extreme. Look at Table 10.8 What do you conclude?

51 Comparison of Means When we are working with a nominal level variable (like Gender) and a ratio level variable, we can compare the means. What are the average salaries of men and women? Men$ 45,260 Women $ 39,995 Is it gender that causes the differences?

52 Controlling for a 3 rd variable What if it is education that explains differences in salary? Workbook: Look at two possible scenarios: Table 10.9 and Table What does each table tell you?

53 How strong are these relationships? While there is certainly a story that can be told just by looking at the data. We can say, for example, in Table 109., that women tend to earn less then men in each category of education. Gender would appear to be related to salary. Is there another way to measure this?

54 Measure of Association What exactly do Measures of Association really mean? Conceptual meaning: –Height and Weight Exercise –10 Volunteers

55 Direction of the Relationship Plus sign: Direct Relationship –both variables change in the same direction –example: As driving speed increases, death rate goes up

56 Direction of the Relationship Minus sign: Inverse Relationship –both variable change but in the opposite direction –Example:As age increases, health status decreases

57 Measures of Association How strong is the association? –Several different measures of association –All assume a linear relationship Some measures of association range from zero to 1 Others range from -1 to +1

58 Measures of Association At the extremes: interpreted the same way: generally get interpreted in a similar way: Perfect Relationship = 1 or -1 No relationship = 0 Handbook: Table 10.11: Patterns –What a perfect direct, perfect inverse relationship looks like –What no relationship looks like

59 Measures of Association General Agreement: –Closer to 1: stronger the relationship –Closer to 0: weaker the relationship However, no agreement on English.5 strong (maybe as good as it gets).3-.4 moderate relationship.2 weak/slight relationship

60 Measures of Association Nominal Data –Lambda, Cramers V and Phi Ordinal –Kendall’s Tau, Gamma, Spearman’s Rho Interval –ETA, Person’s R

61 Measures of Association Are there differences in views based on gender? Workbook: Table Look at the pattern: What do you see? Now look at the measures of association: –How would you interpret it?

62 Measures of Association Ordinal data: Does education make a difference about attitudes about spanking? Workbook: Table Tau B=.109 Tau C=.095 Gamma =.167

63 Measures of Association Is high school GPA associated with college GPA? Can use Spearman’s rho if data is ranked. Can use Pearson’s r if data is interval.

64 Working with Ranked Data Workbook: Table Does education make a difference in terms of salary? –If you have higher education, then you will earn more income? –What do you think from looking at the data> –What does the measure of association tell you?

65 More Options Let’s take 1 st Break