Download presentation
Presentation is loading. Please wait.
1
Exploring Data in R Introduction to R, Part II
Anna Blackstock Statistician, Biostatistics and Information Management Office (BIMO) NCEZID/DFWED
2
Exploratory Data Analysis
Exploratory Data Analysis (EDA) is a term used to describe the process of exploring general dataset characteristics. Three realms of EDA*: Transformation Visualization Modelling *From the book “R for Data Science” (
3
Exploratory Data Analysis
There is not a specific formula for doing EDA “the right way” Goals of EDA: Checking data quality Any mistakes? Any unusual or unexpected values? Understanding distributions Investigating relationships between variables
4
EDA Today Will cover a few basic functions that you can use to get started with data exploration Will NOT provide in-depth instruction on EDA tools (especially data transformation—stay tuned for other courses)
5
Exploring Data When you first read your data into R, you will want to do a few data checks. Functions you may consider:
6
EDA for Categorical Variables
Tables Univariate and multi-way tables Plots Bar plots, stacked bar plots Statistical tests Chi-square tests
7
EDA for Continuous Variables
Summary statistics Mean, variance, percentiles . . . Plots Scatterplots, Box plots Statistical tests T-tests
8
EDA for Categorical + Continuous Variables
Summary statistics Mean, variance, etc. by category Plots Box plots, bubble plots Statistical tests T-tests
9
Where to next? Take an online course
Keep an eye out for future CDC courses See the book “R for Data Science” by Garrett Grolemund and Hadley Wickham: Find an online guide to EDA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.