Quantitative Data Analysis Definitions Examples of a data set Creating a data set Displaying and presenting data – frequency distributions Grouping and recoding Visual presentations Summary statistics, central tendency, variability
What do we analyze? Variable – characteristic that varies Data – information on variables (values) Data set – lists variables, cases, values Qualitative variable – discrete values, categories. –Frequencies, percentages, proportions Quantitative variable- range of numerical values –Mean, median, range, standard deviation, etc.
Creating a data set Enter into a statistical package (program) Program does calculations and displays results Examples: census datacensus data Data on CD (GSS 2004) %20SPSS%20exercise.htm %20SPSS%20exercise.htm
Creating a data set May involve coding and data entry Coding = assigning numerical value to each value of a variable –Gender: 1= male, 2 = female –Year in school: 1= freshman, 2= sophomore, etc. –May need codes for missing data (no response, not applicable) –Large data sets come with codebooks
Displaying and Presenting Data Frequency distribution – list of all possible values of a variable and the # of times each occurs –May require grouping into categories –May include percentages, cumulative frequencies, cumulative percentages
Ungrouped frequency distribution –Usually qualitative variables Grouped frequency distribution –Values are combined (grouped) into categories –Use for quantitative variables –Many separate values Displaying and Presenting Data
Grouping into categories May use meaningful groupings May use equal intervals (more common) –Equal width –Mutually exclusive –Exhaustive Class interval = category, range of values Midpoint = exact middle of interval Limits = halfway to next interval
Summary statistics Percent = relative frequencies; standardized units. Cumulative frequency or percent = frequency at or below a given category (at least ordinal data required)
Visual Presentation of Data Bar graph (column chart, histogram): best with fewer categories Pie chart: good for displaying percentages; easily understood by general audience Line graph: good for numerical variables with many values or for trend data
Summary statistics: central tendency “Where is the center of the distribution?” Mode = category with highest frequency Median = middle category or score Mean = average score
Summary Statistics: Variability “Where are the ends of the distribution? How are cases distributed around the middle?” Range = difference between highest and lowest scores Standard deviation = measure of variability; involves deviations of scores from mean; most scores fall within one standard deviation above or below mean.