Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

HS 67 - Intro Health Statistics Describing Distributions with Numbers
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Lecture 4 Chapter 2. Numerical descriptors
Jan. 29 “Statistics” for one quantitative variable… Mean and standard deviation (last week!) “Robust” measures of location (median and its friends) Quartiles,
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Homework Questions. Quiz! Shhh…. Once you are finished you can work on the warm- up (grab a handout)!
Basic Practice of Statistics - 3rd Edition
Describing Distributions Numerically
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Programming in R Describing Univariate and Multivariate data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Chapter 4: Describing Distributions 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers.
Displaying Quantitative Data Graphically and Describing It Numerically AP Statistics Chapters 4 & 5.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
LIS 570 Summarising and presenting data - Univariate analysis.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Interpreting Categorical and Quantitative Data. Center, Shape, Spread, and unusual occurrences When describing graphs of data, we use central tendencies.
Describing Quantitative Data with Numbers
MATH-138 Elementary Statistics
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
CHAPTER 1 Exploring Data
Objective: Given a data set, compute measures of center and spread.
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 5: Describing Distributions Numerically
Describing Distributions Numerically
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
Exploratory Data Analysis
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Describing Distributions Numerically
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Presentation transcript:

Univariate EDA

Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with what values a variable takes and how often it takes each value Univariate EDA (for quantitative data) –Graphically –Numerically –Model

What is this graph called? How many lake trout were in the mm bin? What is the most common range of lengths? Which range of lengths has the fewest lake trout? How many lake trout were exactly 108 mm? Quantitative Univariate EDASlide #3

Quantitative Univariate EDA What four things are described? Quantitative Univariate EDASlide #4 Shape Outliers Center Dispersion

Quantitative Univariate EDASlide #5 Shape – what are these three shapes? –Symmetric –Left-skewed –Right-skewed Quantitative Univariate EDA

Slide #6 Outliers – what is an outlier? –Individual(s) that is/are distinctly separate* from the main cluster of individuals Quantitative Univariate EDA *at least one or two bars removed *only one or two individuals *on the margins of the distribution

Quantitative Univariate EDASlide #7 Center – what are the two measures of center? –Mean (arithmetic average) –Median (value in the middle of ordered data) Quantitative Univariate EDA  = population mean  x = sample mean  = sample median

Compute the  x and M of values (faculty salaries) below with and without the red value. 38, 46, 42, 44, 44, 43, 45, 45, 46, 44, 139 Examine meanMedian() graphic Quantitative Univariate EDASlide #8

Adequacy of Mean? 18, 19, 20, 21, 22   x = 20 5, 15, 20, 25, 35   x = 20 Does the mean adequately relate all pertinent information for these samples? If not, what is missing? Quantitative Univariate EDASlide #9

Quantitative Univariate EDASlide #10 Dispersion -- variability among individuals What are the three measures of dispersion? –Range (minimum, maximum) –Inter-Quartile Range (IQR; Q1, Q3) –Standard Deviation (average difference from mean) Quantitative Univariate EDA  = population standard deviation s = sample standard deviation

Quantitative Univariate EDASlide #11 Standard Deviation 1) Find the sample mean 2) Find each difference from the mean 3) Square each difference 4) Sum squared differences 5) Divide by n-1 6) Square root Calculation Steps

Compute s from the values below (use table 3.4 in the book as a model). 5, 8, 9, 11, 12 Compute the IQR of values (faculty salaries) below with and without the red value. 38, 46, 42, 44, 44, 43, 45, 45, 46, 44, 139 Quantitative Univariate EDASlide #12

Quantitative Univariate EDA in R Examine Handout – hist() – Summarize() Quantitative Univariate EDASlide #13

Quantitative Univariate EDASlide #14 Overall Numerical Summaries If outliers exist then use the Median and IQR If outliers do not exist, but distribution is strongly skewed then use the Median and IQR If outliers do not exist and the distribution is symmetric or only slightly skewed then use the Mean and standard deviation

What four items are described in a univariate EDA for quantitative data? Describe a univariate EDA for the data in Figure 1 and Table 1. Quantitative Univariate EDASlide #15

Describe a univariate EDA for the data in Figure 2 and Table 2. Quantitative Univariate EDASlide #16

Describe a univariate EDA for the data in Figure 3. Quantitative Univariate EDASlide #17 Figure 3. Histogram of 1996 tuition for 30 public and 50 private colleges and universities.

Quantitative Univariate EDASlide #18 Figure 4. Boxplot of 1996 tuition for 30 public and 50 private colleges and universities. The distribution of tuition for private schools is left-skewed with no obvious outliers, centered on a median of 25430, with an IQR from to (Figure 4; Table 3). The distribution of tuition for public schools is right-skewed with one outlier at a tuition of 23460, centered on a median of 13590, with an IQR from to (Figure 4; Table 3). I chose to use the median and IQR as measures of center and dispersion because of the outlier and the skewness of the distributions. Statistic Public Private Mean Std. Dev Min st Qu Median rd Qu Max Table 3. Summary statistics of 1996 tuition for 30 public and 50 private colleges and universities.