Presentation is loading. Please wait.

Presentation is loading. Please wait.

EMPA Statistical Analysis

Similar presentations


Presentation on theme: "EMPA Statistical Analysis"— Presentation transcript:

1 EMPA Statistical Analysis
Week 2

2 Review: Levels of Measurement
Nominal Categorical Binary Ordinal Interval

3 Measures of Central Tendency
Mean (sum of all values divided by number of values) Median (center of ordered list of values) Mode (most frequent value) Proportion (number of observations with this value divided by number of values) Percent (number of observations with this value divided by number of values times 100%)

4 Measures of dispersion
Range: The spread between the minimum observed value and the maximum observed value Standard deviation: The square root of the average squared distance from the mean, generally inflated to adjust for sampling error Interquartile range: The spread between the first quartile and the third quartile.

5 Which statistics to report
Levels of measurement determine what statistics are appropriate. Nominal None Categorical Mode, proportion/percent Binary Ordinal Median, mode, proportion/percent (mean) Interval Mean, median, (mode, proportion/percent)

6 Introducing R Downloading and installing the R suite Operating R R
RStudio MPA Stats Plugin Operating R Open RStudio Check the box for “RcmdrPluginMPAStats” A new window should open. This is your primary interface with R.

7 Preparing your data for import
Clean your data and name your variables. Format all values in the sheet as numbers by right-clicking, selecting “format cells” and selecting “number” Save as .xls or save a version of the data only tab as .csv

8 Getting your data into R: xls
Open the MPA Rcommander plugin Select Data/Import data from excel file Enter a name for your dataset in R Navigate to your data file in .xls format Select the appropriate sheet (number-formatted data with variable names ONLY) and click “ok” Verify that the Dataset box is blue. If not, click on it and select your dataset.

9 Getting your data into R: csv
Open RCommander Select Data/Import data from text file, clipboard, or URL Enter a name for your dataset in R Check the box for “variable names in file” Select location as “local file system” Select field separator as “commas” Select decimal-point character as “period [.]” Click ok Navigate to your .csv file Verify that the dataset box is blue. If not, click on it and select your dataset.

10 Producing summary statistics
Click MPA Statistics/descriptive statistics/summarize data set Results appear in RStudio console window If you expand the main console window before running the summary command, you can usually get all the values to show up in one row per variable

11 Verifying that the data is “clean”
Are there any values outside those specified by your codebook? Do all binary variables have minimum of 0 and max of 1? Do all maxima and minima make sense? Do the mean values make sense? Do the standard deviations make sense? Is the “N” for each variable the same (or is there a consistent and justifiable subset for only the applicable variables)?

12 Reading scientific notation
“e” followed by a number means “times ten to the power of [number]. For negative powers, move the decimal to the left (making the number smaller) For positive powers, move the decimal to the right (making the number larger 6.81e+03 = 6.81 x 103 = 6810 3.24e-02 = 3.24 x 10-2 =

13 Introducing “factors”
R considers categorical variables to be “factors.” Sometimes it is useful to treat binary or ordinal variables as factors as well. To create a “factor” from a numeric variable, select “data/manage variables in active data set/convert numeric variables to factors.” Select one or more variables, select “supply level names” and enter either a new variable name (if you have only selected one variable to duplicate) or a prefix (such as f_). Enter level names for each variable, according to the codebook

14 Summaries for factors The summary data for factors in a dataset appears at the end of the summary for numeric data. R will report both the number of observations in a category (counts) and the percent of total observations in each category (percents).

15 Additional summary statistics
You can generate additional summary statistics by selecting statistics/summaries/… Active data set (min, max, median, mean, and 1st and 3rd quartiles for all variables) Numerical summaries (select from mean, standard deviation, interquartile range, and other statistics for one or more variables; can be done by groups) Frequency distributions (percent in each category for selected factors only) Count missing observations (returns number of missing observations in each variable) Table of statistics (can be used to generate tables of selected statistics for comparison between factor groups)

16 Does mean=proportion? No. But the mean can be used as a shortcut for identifying proportions for binary variables coded 1/0. Compare the formula for generating a mean with the formula for generating a proportion. They are NOT the same, but they generate the same result for properly coded binary variables.

17 Generating boxplots in R
Select graphs/boxplot Select an ordinal or interval-level variable Select ok Results appear in an RStudio window Note that boxplots can be created by group. The middle line represents the median; the box represents the range from the first to third quartiles (interquartile range) and the cutoff points beyond which observations may be considered statistical outliers.

18 Appropriate graphics for reports
There are essentially three appropriate types of graphics for reports. These are: 2-D Clustered column charts 2-D Line graphs with markers Boxplots comparing two or more groups For particularly technical reports, histograms or scatterplots may also be appropriate, but these generally do not appear in professional reports. Note that pie charts are not appropriate.

19 Pie charts are bad. Courtesy of Michael Friendly at

20 Figure 1: I haven’t really eaten that much of the lemon pie

21 Presenting graphics A table or chart should be completely self-contained Label every axis clearly and completely, including units of measure Title the graphic descriptively, clearly indicating the purpose of the graphic and what it represents Avoid any practices that may be misleading Avoid graphical distractions and manipulations including 3-D effects If you plot multiple items simultaneously, label clearly and provide appropriate axes

22 Presenting graphics, cont.
Match graphics to levels of measurement Column charts are usually the right choice Line graphs are only appropriate for interval data where the lines between points can be interpreted as meaningful estimates Where appropriate, include standard deviation or confidence interval data Particularly when the graphic is being used to illustrate predictions or forecasts Ensure readability. A good graphic is as effective in black and white as in color.


Download ppt "EMPA Statistical Analysis"

Similar presentations


Ads by Google