EMPA Statistical Analysis

Slides:



Advertisements
Similar presentations
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Advertisements

Describing Quantitative Variables
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Measures of Dispersion
Introduction to Summary Statistics
Describing Quantitative Data with Numbers Part 2
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Basic Business Statistics 10th Edition
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
PY550 Research and Statistics Dr. Mary Alberici Central Methodist University.
Describing Data: Numerical
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Numerical Descriptive Techniques
1.3: Describing Quantitative Data with Numbers
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
T T03-01 Calculate Descriptive Statistics Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
LIS 570 Summarising and presenting data - Univariate analysis.
Statistics and Data Analysis
Elementary Analysis Richard LeGates URBS 492. Univariate Analysis Distributions –SPSS Command Statistics | Summarize | Frequencies Presents label, total.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
EXCEL CHAPTER 6 ANALYZING DATA STATISTICALLY. Analyzing Data Statistically Data Characteristics Histograms Cumulative Distributions Classwork: 6.1, 6.6,
Welcome to Week 04 College Statistics
Statistics Descriptive Statistics. Statistics Introduction Descriptive Statistics Collections, organizations, summary and presentation of data Inferential.
Descriptive Statistics
Descriptive Statistics ( )
Exploratory Data Analysis
Descriptive Statistics
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Chapter 2: Methods for Describing Data Sets
How could data be used in an EPQ?
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Summary Statistics 9/23/2018 Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
Topic 5: Exploring Quantitative data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
CHAPTER 1 Exploring Data
Advanced Algebra Unit 1 Vocabulary
Business and Economics 7th Edition
Biostatistics Lecture (2).
Descriptive and elementary statistics
Presentation transcript:

EMPA Statistical Analysis Week 2

Review: Levels of Measurement Nominal Categorical Binary Ordinal Interval

Measures of Central Tendency Mean (sum of all values divided by number of values) Median (center of ordered list of values) Mode (most frequent value) Proportion (number of observations with this value divided by number of values) Percent (number of observations with this value divided by number of values times 100%)

Measures of dispersion Range: The spread between the minimum observed value and the maximum observed value Standard deviation: The square root of the average squared distance from the mean, generally inflated to adjust for sampling error Interquartile range: The spread between the first quartile and the third quartile.

Which statistics to report Levels of measurement determine what statistics are appropriate. Nominal None Categorical Mode, proportion/percent Binary Ordinal Median, mode, proportion/percent (mean) Interval Mean, median, (mode, proportion/percent)

Introducing R Downloading and installing the R suite Operating R R RStudio MPA Stats Plugin Operating R Open RStudio Check the box for “RcmdrPluginMPAStats” A new window should open. This is your primary interface with R.

Preparing your data for import Clean your data and name your variables. Format all values in the sheet as numbers by right-clicking, selecting “format cells” and selecting “number” Save as .xls or save a version of the data only tab as .csv

Getting your data into R: xls Open the MPA Rcommander plugin Select Data/Import data from excel file Enter a name for your dataset in R Navigate to your data file in .xls format Select the appropriate sheet (number-formatted data with variable names ONLY) and click “ok” Verify that the Dataset box is blue. If not, click on it and select your dataset.

Getting your data into R: csv Open RCommander Select Data/Import data from text file, clipboard, or URL Enter a name for your dataset in R Check the box for “variable names in file” Select location as “local file system” Select field separator as “commas” Select decimal-point character as “period [.]” Click ok Navigate to your .csv file Verify that the dataset box is blue. If not, click on it and select your dataset.

Producing summary statistics Click MPA Statistics/descriptive statistics/summarize data set Results appear in RStudio console window If you expand the main console window before running the summary command, you can usually get all the values to show up in one row per variable

Verifying that the data is “clean” Are there any values outside those specified by your codebook? Do all binary variables have minimum of 0 and max of 1? Do all maxima and minima make sense? Do the mean values make sense? Do the standard deviations make sense? Is the “N” for each variable the same (or is there a consistent and justifiable subset for only the applicable variables)?

Reading scientific notation “e” followed by a number means “times ten to the power of [number]. For negative powers, move the decimal to the left (making the number smaller) For positive powers, move the decimal to the right (making the number larger 6.81e+03 = 6.81 x 103 = 6810 3.24e-02 = 3.24 x 10-2 = 0.0324

Introducing “factors” R considers categorical variables to be “factors.” Sometimes it is useful to treat binary or ordinal variables as factors as well. To create a “factor” from a numeric variable, select “data/manage variables in active data set/convert numeric variables to factors.” Select one or more variables, select “supply level names” and enter either a new variable name (if you have only selected one variable to duplicate) or a prefix (such as f_). Enter level names for each variable, according to the codebook

Summaries for factors The summary data for factors in a dataset appears at the end of the summary for numeric data. R will report both the number of observations in a category (counts) and the percent of total observations in each category (percents).

Additional summary statistics You can generate additional summary statistics by selecting statistics/summaries/… Active data set (min, max, median, mean, and 1st and 3rd quartiles for all variables) Numerical summaries (select from mean, standard deviation, interquartile range, and other statistics for one or more variables; can be done by groups) Frequency distributions (percent in each category for selected factors only) Count missing observations (returns number of missing observations in each variable) Table of statistics (can be used to generate tables of selected statistics for comparison between factor groups)

Does mean=proportion? No. But the mean can be used as a shortcut for identifying proportions for binary variables coded 1/0. Compare the formula for generating a mean with the formula for generating a proportion. They are NOT the same, but they generate the same result for properly coded binary variables.

Generating boxplots in R Select graphs/boxplot Select an ordinal or interval-level variable Select ok Results appear in an RStudio window Note that boxplots can be created by group. The middle line represents the median; the box represents the range from the first to third quartiles (interquartile range) and the cutoff points beyond which observations may be considered statistical outliers.

Appropriate graphics for reports There are essentially three appropriate types of graphics for reports. These are: 2-D Clustered column charts 2-D Line graphs with markers Boxplots comparing two or more groups For particularly technical reports, histograms or scatterplots may also be appropriate, but these generally do not appear in professional reports. Note that pie charts are not appropriate.

Pie charts are bad. Courtesy of Michael Friendly at http://www.datavis.ca/gallery/evil-pies.php

Figure 1: I haven’t really eaten that much of the lemon pie

Presenting graphics A table or chart should be completely self-contained Label every axis clearly and completely, including units of measure Title the graphic descriptively, clearly indicating the purpose of the graphic and what it represents Avoid any practices that may be misleading Avoid graphical distractions and manipulations including 3-D effects If you plot multiple items simultaneously, label clearly and provide appropriate axes

Presenting graphics, cont. Match graphics to levels of measurement Column charts are usually the right choice Line graphs are only appropriate for interval data where the lines between points can be interpreted as meaningful estimates Where appropriate, include standard deviation or confidence interval data Particularly when the graphic is being used to illustrate predictions or forecasts Ensure readability. A good graphic is as effective in black and white as in color.