Numerical Descriptives in R

Slides:



Advertisements
Similar presentations
Statistics for Decision Making Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
Advertisements

Describing data with graphics and numbers. Types of Data Categorical Variables –also known as class variables, nominal variables Quantitative Variables.
Edpsy 511 Homework 1: Due 2/6.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
Statistics for Decision Making Bivariate Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Descriptive Statistics A Short Course in Statistics.
STA Lecture 131 STA 291 Lecture 13, Chap. 6 Describing Quantitative Data – Measures of Central Location – Measures of Variability (spread)
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Cereal Box Problem 6.SP - Develop understanding of statistical variability. 3. Recognize that a measure of center for a numerical data set summarizes all.
Chapter 2: Descriptive Statistics Adding MegaStat in Microsoft Excel Measures of Central Tendency Mode: The most.
1 Tutorial 2 GE 5 Tutorial 2  rules of engagement no computer or no power → no lesson no computer or no power → no lesson no SPSS → no lesson no SPSS.
T T03-01 Calculate Descriptive Statistics Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram.
Summary Statistics and Mean Absolute Deviation MM1D3a. Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data.
Introduction to data analysis: Case studies with iSIKHNAS data Day 1 1/ 69.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
The field of statistics deals with the collection,
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Basics in R part 2. Variable types in R Common variable types: Numeric - numeric value: 3, 5.9, Logical - logical value: TRUE or FALSE (1 or 0)
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
S519: Evaluation of Information Systems Social Statistics Ch3: Difference.
Announcements Exams returned at end of class Average = 78 Standard Dev = 12 Key with explanations will be posted Don’t be discouraged: First test is often.
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Prof. Eric A. Suess Chapter 3
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Tidy data, wrangling, and pipelines in R
Python for Data Analysis
Mean, Median, Mode and Standard Deviation (Section 11-1)
Applied Business Forecasting and Regression Analysis
Chapter 1: Exploring Data
Just the basics: Learning about the essential steps to do some simple things in SPSS Larkin Lamarche.
Descriptive Statistics (Part 2)
Module 6: Descriptive Statistics
Slides to accompany Weathington, Cunningham & Pittenger (2010), Statistics Review (Appendix A) Bring all three text books Bring index cards Chalk? White-board.
Measures of Central Tendency
Data manipulation in R: dplyr
Dplyr I EPID 799C Mon Sep
R Programming III: Real Things with Real Data!
ECONOMETRICS ii – spring 2018
R Data Manipulation Bootstrapping
Lab 2 Data Manipulation and Descriptive Stats in R
Theme 4 Describing Variables Numerically
Notes Over 7.7 Finding Measures of Central Tendency
IB Psychology Today’s Agenda: Turn in: Work Day… Take out:
Recoding III: Introducing apply()
Recoding II: Numerical & Graphical Descriptives
Quick Data Summaries in SAS
Recoding III: Introducing apply()
L07 Apply and purrr EPID 799C Fall 2018.
Tidy data, wrangling, and pipelines in R
An Introduction to Statistics
Mean, Median, and Range Obj. 5b.
CS3332(01) Course Description
Chapter 1: Exploring Data
Statistics 1: Introduction to Probability and Statistics
Producing Descriptive Statistics
SYMMETRIC SKEWED LEFT SKEWED RIGHT
Measures of Center and Spread
EECS3030(02) Course Description
Comparing Statistical Data
Chapter 1: Exploring Data
AP Statistics Section 1.2: Describing Distributions with Numbers
Chapter 1: Exploring Data
R-lab 2 -Dorji Pelzom.
Descriptive Statistics Frequency & Contingency Tables
R for Epi Workshop Module 2: Data Manipulation & Summary Statistics
Finding Statistics from Data
Presentation transcript:

Numerical Descriptives in R EPID 799C Monday, Sept. 11, 2017

Today’s Overview Lecture: Basic descriptive statistics in R Practice: Group exercises with the births dataset

Key Functions for Numerical Summary Measures Measures of Centrality mean() median() mode() Measures of Spread min() max() quantile() range() sd() var()

Key Functions for Numerical Summary Measures Measures of Centrality mean() median() mode() Measures of Spread min() max() quantile() range() sd() var() General syntax: function(dataset$variable) Example: mean(births$mage)

Key Functions for Numerical Summary Measures Measures of Centrality mean() median() mode() Measures of Spread min() max() quantile() range() sd() var() What happened?

Key Functions for Numerical Summary Measures Measures of Centrality mean() median() mode() Measures of Spread min() max() quantile() range() sd() var() General syntax: function(dataset$variable) If there is missing data: function(dataset$variable, na.rm = TRUE)

Key Functions for Numerical Summary Measures Measures of Centrality mean() median() mode() Measures of Spread min() max() quantile() range() sd() var()

Key Functions for Multiple Summary Measures summary() is a generic function that produces result summaries depending on the class of the first argument

Summary() is flexible & takes different classes of objects Looking ahead: summary() helps you look at and summarize your regression model output!

Summary() is flexible & takes different classes of objects Backing up: Factor variables are an important class in R and useful for producing summary statistics, plots, and regression models

Key Functions for Multiple Summary Measures describe() from the Hmisc package is another generic method for summarizing a vector, matrix, or dataframe

Key Functions for Multiple Summary Measures describe() from the Hmisc package is another generic method for summarizing a vector, matrix, or dataframe

Summary() and Describe() with Categorical Variables Use of these functions is not limited to numeric variables:

Summarizing Multiple Variables: The Apply() Family Using the sapply function to get summary statistics: Calculates means for all variables in births dataset (excludes missing values and returns NA for non-numeric variables) Using the lapply function to get summary statistics: Get mean, min, max, and sample size for specified variables, and bind together in a dataframe

Summary Statistics by Subset or Group Use indexing within the function Use tapply(Summary Variable, Group Variable, Function)

Summary Statistics by Subset or Group Many other ways to do this! My favorite: using the dplyr package (more on this soon)

Descriptive Statistics for Dates Use the lubridate package to work with dates/times In your homework: recode the dob string variable so that R recognizes it as a date Once R recognizes this variable as a date, you can use the same descriptive statistic functions to evaluate and summarize the date values in your data

Practice with Descriptive Statistics Practice exercises are posted to the course website All questions focus on the births dataset Work as a group on the descriptive statistics exercises Apply the functions we’ve introduced today For R veterans, try out the challenge questions Will require some pre-processing of the data