Download presentation
Presentation is loading. Please wait.
1
EMPA Statistical Analysis
Week 1 Eva Witesman, Ph.D. Romney Institute of Public Management Brigham Young University
2
Course format Canvas RSS feed for calendar Daily class schedule
Readings Participation Application assignments (view first assignment) Final project (examples, template)
3
Why statistics? How do you do your job better?
What information would be useful? What information do you collect? Three main roles for public managers Consumers of data Producers of data Analysts
4
Types of Questions for Analysis
Descriptive analysis: Describing the sample Examples?
5
Types of Questions for Analysis
Inferential analysis: Describing the population based on the sample Extrapolating from a sample to a population Benchmarking Examples?
6
Types of Questions for Analysis
Bivariate and multivariate analysis: Testing relationships between variables Does X cause Y even in the presence of Z? Which of several X factors have the greatest impact on Y? Predicting Y using a variety of X variables Comparing pre-test/post-test values Examples?
7
Sources/types of data Data you (or others) collect already as part of your administrative processes (administrative data) Data you collect on purpose (survey data, observations, measurements) Data others have collected on purpose (Public or purchased data)
8
Data Generation Exercise
Let’s create some data based on the three principles/one outcome model Take the survey here:
9
Construct operationalization
Construct: The broad idea or concept to be measured (e.g. “academic success”) Definition: The specific manner in which the concept is to be defined (e.g. “cumulative grade point average from undergraduate education”) Operational definition: The specific data source for observing the construct (e.g. “self-reported undergraduate GPA on question 4 of the intake survey”)
10
Levels of measurement Text (string): The data is text only and each entry is completely unique. Nominal: Each entry is completely unique and without common groupings or shared meaning (usually ID variables, addresses, phone numbers, etc.). Can be text or numeric.
11
Levels of measurement Categorical: Unordered but finite categories (e.g. MPA emphasis, shirt color, race, region, country of origin, etc.) Binary (dummy/dichotomous): Categorical variable with exactly two categories.
12
Levels of measurement Ordinal: Ordered categories but with nonstandard units or “bins” of data (e.g. age or income ranges, Likert scales)
13
Levels of measurement Interval: Data measured in ordered, standard units (e.g. height in inches, age in years, weight in pounds, costs in USD, etc.) Continuous, ratio, count, etc.: Other, more specific, subsets of interval-level data.
14
Dataset characteristics
Unit of analysis: Defines the way in which each row in the database is unique; identifies the type of unit or object being studied. Examples include individuals, households, states, cities, months, state/year, programs, etc. This is the level at which data is collected. What is the unit of analysis for our example data?
15
Dataset characteristics
Observations (observational units): Each row in the database should be a unique observation or observational unit. For example, if you are studying households, each row would contain data about a single household, and each household is an “observation” or “observational unit.” Where/who are the observations in our practice data?
16
Dataset characteristics
Variables: Each column identifies a specific characteristic of, or piece of information about, the observations. Variables have values that are not the same for all observations in the dataset (otherwise they would be “constants” and should be excluded from the analysis). What are the variables in our practice data? What constructs are they operationalizing?
17
Sampling terminology Population of interest: The population is the large group of observations about which you want to make claims or inferences. This is the group you are ultimately trying to study.
18
Sampling terminology Sampling frame: This is the representation of the population from which you will draw a sample. The sampling frame is usually a list of contact information or another method of accessing people in the population.
19
Sampling terminology Sample: This is the subset of the population you have selected out of the sampling frame to examine or gather data about.
20
Sampling terminology Response rate: This is the percent of sampled people, in a survey, who provided responses to the survey. This is an indication of the level of participation in the study and can be a measure of potential bias. Effective response rate: This is the percent of sampled people, in a survey, who provided complete and useable responses (as determined following data cleaning).
21
Sampling terminology Random sample: A sample that was taken by randomly (true randomness!) selecting a sample from the sampling frame. This is generally acknowledged to be the least biased method, though stratified random sampling may result in more representative data.
22
Sampling terminology Nonrandom sample: A sample taken in any way other than random selection. Convenience sampling: Selecting based on convenience to the researcher Snowball sampling: Having study participants suggest other individuals to participate in the study Purposive sampling: Identifying specific targets to be sampled into the study
23
Sampling terminology Stratified random sampling: Dividing the sampling frame into subsets based on key characteristics and then randomly sampling within the subsets Census Attempting to sample every member of the sampling frame.
24
Sampling terminology Bias: Anything that may cause your study to fail at being accurately representative of the population of interest.
25
Installing R Install R, RStudio, and Rcommander
Install MPA plugin version 3
26
Application Identify a data source Clean data Generate a codebook
Begin application assignment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.