Class Session #2 Numerically Summarizing Data

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Measures of Dispersion
Numerically Summarizing Data
Descriptive Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Numerically Summarizing Data
Intro to Descriptive Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Describing distributions with numbers
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Objectives 1.2 Describing distributions with numbers
Numerical Descriptive Techniques
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Review Measures of central tendency
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Describing distributions with numbers
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Summary Statistics: Measures of Location and Dispersion.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics ( )
Chapter 1: Exploring Data
Chapter 6 ENGR 201: Statistics for Engineers
NUMERICAL DESCRIPTIVE MEASURES
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Data with Numerical Measures
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Compare and contrast histograms to bar graphs
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Class Session #2 Numerically Summarizing Data Measures of Central Tendency Measures of Dispersion Measures of Central Tendency and Dispersion from Grouped Data Measures of Position

Recall the Definitions Parameter – a descriptive measure of a population (p = parameter = population, usually in Greek letters) Statistic – a descriptive measure of a sample (s = statistic = sample, usually in Roman letters)

Common “descriptions” ? Average ? – “typical” as described in the news reports Give some of today’s examples Data distributions’ “characteristics” Shape – look at a picture (histogram) Center – mean, mode, median Spread – range, variance, std. dev.

Central Tendency Definitions Arithmetic mean – the sum of all the values of the variable in the data set, divided by the number of observations Population arithmetic mean - computed using all the individuals in the population (“mew” = μ) (≠ micro µ) Sample arithmetic mean – computed using the sample data (“x-bar”) Note:  is a statistic, μ is a parameter

More Central Tendency Defs Median – the value that lies in the middle of the data, when arranged in ascending order (think of the median strip of highway in the middle of the road) Mode – the most frequent observation of the variable in the data set (think “a la mode” in fashion /on top)

Measures of Dispersion Definitions Range (R) – the difference between the largest data value (maximum) & the smallest data value (minimum) Deviation about the mean – how “spread out” the data is. ? for both population and sample variance, the sum of all deviations about the mean equals what ? ? the square of a non-zero number is ?

More Measures of Dispersion Definitions Population Variance – sum of squared deviations about the population mean, divided by the number of observations in the population N (sigma squared) ? i.e. population variance is the mean of the ______ _________ ____ __ _________ ___ ? Answer: Population variance is the mean of the squared deviations about the population mean

More Measures of Dispersion Definitions Sample Variance – sum of the squared deviations about the sample mean, divided by the number of observations minus one (s squared) Degrees of freedom is the “n-1”

More Measures of Dispersion Definitions Population Standard Deviation – the square root of the population variance (sigma, written as “σ”) Sample Standard Deviation – the square root of the sample variance (s, written as “s”) BTW, later we discover “s” itself is a random variable

Empirical Rule for Symmetric Data If the distribution is bell shaped: 68% of data within 1 std deviations 95% of data within 2 std deviations 99.7% of data within 3 standard deviations of the mean Rule holds for both samples & populations

Supposing Grouped Data Approximate mean of a variable from a frequency distribution Use the midpoint of each class Use the frequency of each class Use the number of classes Population Mean Sample Mean

Supposing Grouped Data Weighted Mean Good to use when certain data values have higher importance (or weight) [Sum of each value of variable times its weight] / [sum of weights] Examples of Grade Point Average (GPA) and mixed nuts pricing

Supposing Grouped Data Population Variance sum of [(midpoint – mean)2 times frequency] / [sum of frequencies] Sample Variance as before except “-1” in denominator (the degrees of freedom thing again)

Supposing Grouped Data Population Standard Deviation take square root of population variance Sample Standard Deviation take square root of sample variance

Measures of Position Definition z-Score – the distance that a data value is from the mean in terms of standard deviations. Equals (data value minus mean) divided by standard deviation] Population z-score Sample z-score

Measures of Position Definitions z-score equals [(data value minus mean) divided by standard deviation] Is a "unitless" measure Can be “normalized” to get Mean of zero Standard Deviation of one

Measures of Position Definitions z-score purpose is to provide a way to "compare apples and oranges" by converting variables with different centers and/or spreads to variables with the same center (0) and spread (1).

Measures of Position Definition Percentiles – k th percentile is a set of data divides the lower k% from the upper (1-k)% Divide into 100 parts, so 99 percentiles exist “P sub k” Use to give relative standing of the data

Measures of Position Definition Quartiles – divides the data into four equal parts Four parts, so three percentiles exist “Q sub one, two, or three” Q2 is the median of the data Q1 is the median of the lower half Q3 is the median of the upper half

Numerical summary of data Five number summaries Interquartile range (Q3 – Q1) is resistant to extreme values Compute five number summary Min value | Q1 | M | Q3 | max value

Building a Box Plot – part 1 1. Calculate interquartile range (IQR) 2. Compute lower & upper fence Lower fence = Q1 – 1.5 (IQR) Upper fence = Q3 + 1.5 (IQR) 3. Draw scale then mark Q1 and Q3 4. Box in Q1 to Q3 then mark M

Building a Box Plot – part 2 5. Temporarily mark fences with brackets 6. Draw line from Q1 to smallest value inside the lower fence and a line from Q3 to largest value inside the upper fence 7. Put * for all values outside of the fences 8. Erase brackets

Distribution based on Boxplot Symmetric median near center of box horizontal lines about same length Skewed Right / Positive Skew median towards left of box right line much longer than left line Skewed Left / Negative Skew median towards right of box left line much longer than right line

Which measure best to report? Symmetric distribution Mean Standard Deviation Skewed distribution Median Interquartile Range

Self Quiz When can the mean and the median be about equal? In the 2000 census conducted by the U.S. Census Bureau, two average household incomes were reported: $41,349 and $55,263. One of these averages is the mean and the other is the median. Which is which and why?

Self Quiz The U.S. Department of Housing and Urban Development (HUD) uses the median to report the average price of a home in the United States. Why do they do that?

Self Quiz A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will be larger, the mean or the median? Why?

Self Quiz _____ is a descriptive measure of a population If a data set contains 10,000 values arranged in increasing order, where is the median located? Matching: (parameter; statistic) _____ is a descriptive measure of a population _____ is a descriptive measure of a sample.

Self Quiz A data set will always have exactly one mode. (true or false) If the number of observations, n, is odd; then the median, M, is the value calculated by the formula M=(n+1)/2

Self Quiz Find the Sample Mean: 20, 13, 4, 8, 10 83, 65, 91, 87, 84 Find the Population Mean: 3, 6, 10, 12, 14

Self Quiz The median for the given list of six data values is 26.5. 7 , 12 , 21 , , 41 , 50 What is the missing value?

Self Quiz The following data represent the monthly cell phone bill for the cell phone for six randomly selected months. $35.34 $42.09 $39.43 $38.93 $43.39 $49.26 Compute the mean, median, and mode cell phone bill.

Self Quiz Heather and Bill go to the store to purchase nuts, but can not decide among peanuts, cashews, or almonds. They agree to create a mix. They bought 2.5 pounds of peanuts for $1.30 per pound, 4 pounds of cashews for $4.50 per pound, and 2 pounds of almonds for $3.75 per pound. Determine the price per pound of the mix.