Statistics—the art and science of collecting and understanding data (recorded information) —looks at the “big picture” thru data reduction —can pay appropriate attention to individuals Useful for: • Financial Statements • Personnel Records • Prices/Rates/Quantity • Production Quality • Sales/Market Reports • Opinion Polls • Economic/Demographic Conditions & Trends
Population — Parameter Basic Activities of Statistics Sample — Statistic Basic Activities of Statistics Design a Plan for Data Collection Exploring the Data Estimate an Unknown Quantity Test Hypotheses Model the Data
Four Functions of Data Reduction Types of Statistical Analyses • Summarization • Communication • Conceptualization • Interpolation Types of Statistical Analyses Descriptive Analysis Inferential Analysis Differences Analysis Associative Analysis Predictive Analysis
Nonprobability Samples Sampling Methods Simple Random Sampling Systematic Sampling Convenience Sampling Judgment Sampling Referral Sampling Sample Bias Probability Samples Nonprobability Samples
Variables—characteristics of a population Values—the observations of a variable discrete variables— a specific set/list of values continuous variables— “infinite” set of values Data Coding—assign values to observations independent variable—influencing variable dependent variable—variable being influenced
Cross-Sectional data—one point in time Time-Series data—same population over time —(longitudinal study) Qualitative data versus Quantitative data Univariate / Bivariate / Multivariate data Primary versus Secondary data
Levels of Measurement Nominal—name / classification only Ordinal—order / ranking possible Interval—equal distances apart / arbitrary zero Ratio—equal distances apart / true zero
Measures of Central Tendency Mode—most frequently occurring value Median—the middle value in an ordered set Mean—the average of all observations Appropriateness of these measures? Limitations of these measures? =
Measures of Dispersion Range—lowest to highest Percentiles—25%, 50%, 75% Interquartile Range—middle 50% Deviations—distances from means Variance—sum of squared deviations / (n-1) ( ) x n i - = å 1 2 s = 2
Measures of Dispersion Standard Deviation—the average distance from the mean, in absolute terms Coefficient of Variation—compares deviations of different samples / variables 2 s = s cv =
Normal Distribution—ideal, bell-shaped curve Empirical Rule —68% of values w/in 1 standard deviation —95% of values w/in 2 standard deviations —99.7% of values w/in 3 standard deviations Sampling Distribution—a distribution of all sample means yields a normal distribution
Central Limit Theorem—if we take many random samples of sufficient size (n > 30) then the sampling distribution of means from these random samples will form a normal distribution Z-Values / Z-Scores—individual deviation Z-Score = individual value — mean standard deviation
Graphical Representation of Data Pie Charts Bar Charts Histogram Boxplot (Box and Whisker Plot) Stem and Leaf Plot Scatterplot
Types of Distributions Normal Uniform Symmetric Skewed (Positive / Negative) Binomial Bimodal Tchebysheff’s Theorem 1 — (for k>1) 1 k 2
Types of Questions Open-Ended versus Closed-Ended Dichotomous versus Multiple Category Scaled-Response Modified Likert Scale (Agree/Disagree) Semantic-Differential Scale (Opposites) Stapel Scale (one dimension measured) (often 10 responses)
Sampling Error—the difference between the sample finding and the true population It is the error that occurs from sample use It is caused by: (1) the sampling method and (2) the size of the sample If we know that a sample (and statistics) will not perfectly reflect the population, how can we measure (account for) accuracy? Or, how far off could our measures be? Based on: (1) Sample Size and (2) Variability
Standard Error—indicates approximately how far the observed value for the statistic is from the true population value The standard error indicates the amount of uncertainty in a sample statistic—how far it could be from the population parameter The standard deviation indicates the amount of variability among individual observations in a sample—how far they are from average Standard error = the standard deviation of the Theoretical Sampling Distribution
STANDARD ERROR OF THE MEAN Where… = standard error of the mean s = standard deviation of the sample n = the sample size
STANDARD ERROR OF A PERCENTAGE Where… = standard error of the mean p = the sample percentage q = (100-p) n = the sample size
Point Estimation—calculating and reporting a single value to estimate a parameter So, how do we account for accuracy? Use: Confidence Interval—a range of values that have some known probability of containing the "true" population value —or— a range of values that should include an unknown parameter a certain percent of the time The interval width will be affected by the standard error and confidence level desired
CONFIDENCE INTERVALS to Estimate the Population Mean to Estimate the Population Percentage For Z, use 1.65 at 90% confidence, use 1.96 at 95% confidence, and use 2.58 at 99% confidence
Interpreting the Confidence Interval Correct: 95% of all such intervals will contain the true population parameter Correct: We are 95% confident that the interval covers the true population parameter DO NOT say that we are 95% confident that the population parameter is in the interval Why?—because the parameter is fixed The parameter does not vary from sample to sample; it is either in the interval or it is not