PA330 FEB 28, 2000
General information 2nd half of semester - switching gears to statistics Goals - develop ability to: Calculate statistics commonly used in public administration Choose appropriate statistical tool for various problems/research questions Interpret statistical applications used in other research (academic and applied)
How to accomplish these goals Read before class - will enhance your understanding of lectures Meier/Brudney is a great “cookbook”. Use it! Keep a formula sheet, list of notations Do repeated problems -the more you practice, the more comfortable and quicker you become class exercises, homework, lab homework don’t rush through homework
TYPES OF STATISTICS Descriptive statistics - to describe a characteristic. The aggregated measurement of a single variable for a number of cases Inferential statistics - uses a sample of cases to infer characteristics to a population. In other words, use data from cases studied to conclude something about an entire population that includes cases not studied. Inferential is built upon descriptive statistics
Univariate vs multivariate statistics Univariate statistics - tell us about the distribution of the values of one variable Multivariate statistics - measure the joint distribution of two or more variables and assess the relationships between and among the variables
Prelude to data analysis Data coding Array - present the measurements associated with each case ALWAYS “eyeball” your data before analyzing Overhead - Low Income Uninsured Children
Visual Presentation Tables Graphs bar graphs circle graphs line graphs A:/Dagedata#5.wb3 Page 1 - data Page 2 - charts
Bar graph (histogram) quantitative differences but not portion of the whole shows both #s and magnitude
Circle (pie) graph parts of a whole: relationships, percentages
Line graph Used to display trends, change over time
Statistical Tools for Describing the World - Distributions P Intuitive Definition < A bunch of numbers that measure a characteristic for a group of cases. < May be represented by a set of numbers, a graph or picture, or even a mathematical equation.
Frequency Distribution A frequency distribution is a graph or chart that shows the number of observations of a given value or class interval.
Frequency distribution Frequency distribution - lists the values of categories of each variable and the number of cases with each of the values Categories must be exhaustive and mutually exclusive (each case will fit into one and only one category) Can be displayed both numerically and graphically O’Sullivan, Table 11.5
The Frequency Histogram To create a frequency histogram Determine the class interval width Determine the number of intervals desired Tally number of observations in each range Create bar chart from class totals
Frequency polygon Same as a frequency histogram except the midpoints of the class intervals are used Points are connected with a line graph A large number of classes will make the distribution a smooth curve if there is a large sample size.
Stem and leaf plot Preserves data and creates histogram
Frequency Distributions Shape Modality The number of peaks in the curve Skewness An asymmetry in a distribution where values are shifted to one extreme or the other. Kurtosis The degree of peakedness in the curve
Frequency Distributions Modality Unimodal Bimodal Multi-modal
Frequency Distributions Skewness Right Skew (Positive Skew) Left Skew (Negative Skew)
Frequency Distributions Kurtosis Platykurtic Leptokurtic Mesokurtic
Characteristics of a distribution When summarizing cases or the distribution of their values, we usually want to know two things: how similar are the individual values to each other (measures of central tendency) how different are the values from one another (measures of dispersion)
Measures of Central Tendency Measures which provide some indication of the typical value or the 'middle' of the distribution
Measures of Central Tendency Listed in order of least to most useful mode - most frequent category median - the middle value mean - arithmetic average
Mode The category or value that most commonly occurs among all cases In a frequency distribution, is the value with the highest frequency Can be determined for all levels of measurement Used primarily for nominal data
Mode The peak (or tallest) value of a frequency distribution histogram is also referred to as the mode. The mode is the category or value, not the number of cases containing that value
Median The value or category that is the center of the distribution Requires ordinal or interval/ratio level of measurement (can be determined only if the cases can be ordered) Best measure for skewed distributions One half of cases have a value less than the median and one half have a value more than the median
Median Place cases in order, then select middle value. If even number of cases, average the two middle values
Mean Arithmetic average Most commonly known measure of central tendency Is the “balance point” or center of the distribution Most appropriate for symmetric distributions Influenced by extreme values
Mean Created by summing values of each case and dividing by number of cases
Calculate mean, median, mode
Mean = 5 Median = 5 Mode = 6
Grouped data - median If data is grouped (as in a frequency distribution) the exact value of the median can’t be found. It can be estimated by finding the class interval of the middle case and taking the mid-point of the interval. Overhead - O’S, Table 11.11
Find midpoint of each class interval. Grouped data - mean Again, will not be exact. Find midpoint of each class interval. Multiply the frequency for that interval times the midpoint. Sum and divide by number of cases. O’Sullivan, table 11.12
How to select appropriate measure of central tendency Level of measurement median requires ordinal mean requires interval Shape of the distribution unimodal vs bimodal skewedness
If unimodal and symmetrical (or almost), mean is preferred measure mean, median, mode will be the same (or almost) If extreme values, mean is distorted and median should be used Overhead, O’Sullivan, pg 341
Mathematical notation Important mathematical notation the student needs to know. å PSummation < The 3 is a symbolic representation of the process of adding up a specified series or collection of numbers. < For instance, the sum of all Xi from {I=1} to n means: beginning with the first number in your data set, add together all n numbers. X i i = 1
Mean The sum of all of the cases (numbers) in a set, divided by the number of cases in the set Sample mean Population mean
Measures of dispersion Two distributions can have very similar means but different overall values. These measures tell us how much individual values vary from one another. Small values on the measure of dispersion imply more uniformity; larger values imply more diversity.
Measures of Dispersion Range Highest value minus the lowest value Uses only two pieces of information, so is strongly influenced by these two values. $58,000 - $12,000 = $46,000 (mean=$35,000) $36,000 - $34,000 = $2,000 (mean=$35,000)
Measures of Dispersion standard deviation and variance Standard deviation - measures the average distance of values in a distribution from the mean of the distribution Variance - the square of the standard deviation Important statistics used in many other statistical measures and tests
Measures of Dispersion variance The mean of the squared deviations
Or more simply
Standard deviation Square the deviations to remove minus signs Take the square root to return to the original scale
Standard deviation Again, more simply
Common notation if standard deviation of a sample, rather than a population
Calculating the Standard Deviation The easiest way to calculate the standard deviation is to use a computer. Can be done in SPSS and spreadsheets such as Excel and Quattro Pro
Interpretation of standard deviation Standard deviation can only be interpreted in conjunction with mean and range