CHAPTER 3 Data Description. OUTLINE 3-1Introduction 3-2Measures of Central Tendency 3-3Measures of Variation 3-4Measures of Position 3-5Exploratory Data.

Slides:

Advertisements

Similar presentations

Chapter 3 Data Description

Advertisements

Measures of Dispersion

Numerically Summarizing Data

Descriptive Statistics

Calculating & Reporting Healthcare Statistics

Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.

B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.

Chapters 3 & 4 Alan D. Smith Descriptive Statistics -

Data Summary Using Descriptive Measures Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.

Slides by JOHN LOUCKS St. Edward’s University.

B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.

Chapter Two Descriptive Statistics McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.

1 Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. C H A P T E R T H R E E DATA DESCRIPTION.

Chapter 3 Data Description 1 Copyright © 2012 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.

1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

Descriptive Statistics

CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Chapter 3 – Descriptive Statistics

 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.

© The McGraw-Hill Companies, Inc., Chapter 3 Data Description.

Review Measures of central tendency

STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.

Chapter 2 Describing Data.

Describing distributions with numbers

McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.

Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.

1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.

1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.

Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.

INVESTIGATION 1.

Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

©2003 Thomson/South-Western 1 Chapter 3 – Data Summary Using Descriptive Measures Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson.

Chapter 3 Data Description Section 3-3 Measures of Variation.

Business Statistics Spring 2005 Summarizing and Describing Numerical Data.

LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.

1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.

Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.

CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Chapter 3 Data Description Section 3-2 Measures of Central Tendency.

Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.

Chapter 3 Data Description 1 © McGraw-Hill, Bluman, 5 th ed, Chapter 3.

Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.

Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.

Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.

Data Description Note: This PowerPoint is only a summary and your main source should be the book. Lecture (8) Lecturer : FATEN AL-HUSSAIN.

Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.

Chapter 3 Section 3 Measures of variation. Measures of Variation Example 3 – 18 Suppose we wish to test two experimental brands of outdoor paint to see.

MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.

Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.

Data Description Chapter(3) Lecture8)

© McGraw-Hill, Bluman, 5th ed, Chapter 3

Describing, Exploring and Comparing Data

CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.

NUMERICAL DESCRIPTIVE MEASURES

Descriptive Statistics

Numerical Descriptive Measures

CHAPTET 3 Data Description.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

Chapter 3 Data Description

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Presentation transcript:

CHAPTER 3 Data Description

OUTLINE 3-1Introduction 3-2Measures of Central Tendency 3-3Measures of Variation 3-4Measures of Position 3-5Exploratory Data Analysis

OBJECTIVES Summarize data using the measures of central tendency, such as the mean, median, mode, and midrange. Describe data using the measures of variation, such as the range, variance, and standard deviation.

OBJECTIVES Identify the position of a data value in a data set using various measures of position, such as percentiles, deciles and quartiles. Use the techniques of exploratory data analysis, including stem and leaf plots, box plots, and five-number summaries to discover various aspects of data.

3-1 Introduction A statistic is a characteristic or measure obtained by using the data values from a sample. A parameter is a characteristic or measure obtained by using the data values from a specific population.

3-2 Measures of Central Tendency  Mean  Median  Mode  Mid-range

3-2 The mean (arithmetric average) meanThe mean is defined to be the sum of the data values divided by the total number of values.

3-2 The Sample Mean The symbol X represents the sample mean. X is read as “X - bar”. The Greek symbol Σ is read as “sigma” and means “to sum”.

The following data represent the annual chocolate sales (in millions of RM) for a sample of seven states in Malaysia. Find the mean. RM2.0, 4.9, 6.5, 2.1, 5.1, 3.2, 16.6

3-2 The Population Mean The Greek symbol µ represents the population mean. The symbol µ is read as “mu”. N is the size of the finite population.

A small company consists of the owner, the manager, the salesperson, and two technicians. Their salaries are listed as RM 50,000, 20,000, 12,000, 9,000 and 9,000 respectively. Assume this is the population, find the mean.

In a random sample of 7 ponds, the number of fishes were recorded as the following, find the mean

3-2 The Sample Mean for an Ungrouped Frequency Distribution The mean for an ungrouped frequency distribution is given by f = frequency of the corresponding value X n =  f

The scores of 25 students on a 4-point quiz are given in the table. Find the mean score. Score, XFrequency, f

Score, XFrequency, ff X

The number of balls in 17 bags were counted. Find the mean Number of ballsFrequency

3-2 The Sample Mean for a Grouped Frequency Distribution The mean for grouped frequency distribution is given by X m = class midpoint = (UCL + LCL) / 2

The lengths of 40 bean pods were showed in the table. Find the mean. Length (cm)Frequency, f

Length (cm) Frequency, fMidpoint, X m f X m

Time (in minutes) that needed by a group of students to complete a game are shown as below. Find the mean. Time (mins)Frequency

Mean =  (fX) / n = 23 / 12 = 1.92 Mean = (  f  X) / n = (12 x 10) / 12 = 10 Score, XFrequency, ffX  X = 10n =  f = 12  (f X) = 23

3-2 The Median When a data set is ordered, it is known as a data array. The median is defined to be the midpoint of the data array. The symbol used to denote the median is MD.

The ages of seven preschool children are 1, 3, 4, 2, 3, 5, and 1. Find the median. 1. Arrange the data in order. 2. Select the middle point.

In the previous example, there was an odd number of values in the data set. In this case it is easy to select the middle number in the data array. When there is an even number of values in the data set, the median is obtained by taking the average of the two middle numbers.

Six customers purchased these numbers of magazines: 1, 7, 3, 2, 3, 4. Find the median. 1. Arrange the data in order. 2. Select the middle point.

3-2 The Median - Ungrouped Frequency Distribution For an ungrouped frequency distribution, find the median by examining the cumulative frequencies to locate the middle value. If n is the sample size, compute n/2. Locate the data point where n/2 values fall below and n/2 values fall above.

LRJ Appliance recorded the number of VCRs sold per week over a one-year period. No. Sets SoldFrequency Total24

To locate the middle point, divide n by 2; 24/2 = 12. Locate the point where 12 values would fall below and 12 values will fall above. Consider the cumulative distribution. The 12th and 13th values fall in class 2. Hence MD = 2.

This class contains the 5 th through the 13 th values. No. Sets SoldFrequencyCumulative Frequency Median

3-2 The Median - Grouped Frequency Distribution For grouped frequency distribution, find the median by using the formula as shown below: Median, MD = L m + (W) n = sum of frequencies cf = cumulative frequency of the class immediately preceding the median class f = frequency of the median class w = class width of the median class L m = Lower class boundary of the median class

Find the median by using the following data. ClassFrequency, f

ClassFrequency, fCumulative Frequency

To locate the halfway point, divide n by 2; 17/2 = 8.5 ≈ 9. Find the class that contains the 9 th value. This will be the median class. Consider the cumulative distribution. The median class will then be

=17 = = = –25.5=5 () ()= (17/2)–8 4 = n cf f w L MD ncf f wL m m     ().

Find the median by using the following data. ClassFrequency, f

3-2 The Mode The mode is defined to be the value that occurs most often in a data set. A data set can have more than one mode. A data set is said to have no mode if all values occur with equal frequency.

Find the mode for the number of children per family for 10 selected families. Data set: 2, 3, 5, 2, 2, 1, 6, 4, 7, 3. Ordered set: 1, 2, 2, 2, 3, 3, 4, 5, 6, 7. Mode: 2.

Six strains of bacteria were tested to see how long they could remain alive outside their normal environment. The time, in minutes, is given below. Find the mode. Data set: 2, 3, 5, 7, 8, 10. There is no mode since each data value occurs equally with a frequency of one.

Eleven different automobiles were tested at a speed of 15 mph for stopping distances. The distance, in feet, is given below. Find the mode. Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26. There are two modes (bimodal). The values are 18 and 24.

3-2 The Mode – Ungrouped Frequency Distribution ValuesFrequency, f Mode Highest frequency

The mode for grouped data is the modal class. The modal class is the class with the largest frequency.

ValuesFrequency, f Modal Class Highest frequency 3-2 The Mode – Grouped Frequency Distribution

3-2 The Midrange  The midrange is found by adding the lowest and highest values in the data set and dividing by 2.  The midrange is a rough estimate of the middle value of the data.  The symbol that is used to represent the midrange is MR.

3-2 Distribution Shapes  Frequency distributions can assume many shapes.  Three are three most important shapes: i) Positively skewed ii) Symmetrical iii) Negatively skewed

Positively Skewed X Y Mode < Median < Mean Positively Skewed

Symmetrical n Y X Symmetrical Mean = Media = Mode

Negatively Skewed < Median < Mode

3-3 Measures of Variation Measure the variation and dispersion of the data. Range Variance Standard Deviation Coefficient of Variation

3-3 Range  The range is defined to be the highest value minus the lowest value. The symbol R is used for the range.  R = highest value – lowest value.  Extremely large or extremely small data values can drastically affect the range.

3-3 Population Variance General Formula The symbol for population variance is. X = Individual value µ = Population mean N = Population size

3-3 Population Standard Deviation General Formula The population standard deviation is square root of the population variance.

 ean  = ( )/6 = 210/6 = 35.

XX-µ(X-µ) ∑X: 210 ∑(X-µ) 2 : 1750 The variance  2 = 1750/6 = The standard deviation  =  = 17.08

A class of 6 children sat a test; the resulting marks scored out of 10, were as follows: Calculate the mean, variance and standard deviation of this population.

3-3 Sample Variance General Formula The symbol for sample variance is. = Sample mean n = Sample size

3-3 Sample Standard Deviation General Formula The sample standard deviation is square root of the sample variance.

Mean, X = ( )/6 = 210/6 = 35

XX-X(X-X) ∑X: 210 ∑(X-X) 2 : 250 The variance  2 = 250/(6-1) = 50 The standard deviation  =  50 = 7.07

 Ungrouped frequency distribution  Grouped frequency distribution How to find the population variance and standard deviation for data with frequency provided?

For ungrouped data, use the actual observed X value in the different classes. For grouped data, use the same formula with the X value replaced by class midpoints, X m.

 Ungrouped frequency distribution  Grouped frequency distribution How to find the sample variance and standard deviation for data with frequency provided?

Example XfXmXm fX m (X m - µ)(X m - µ) 2 f(X m - µ) N = 17 ∑fX m = ∑f(Xm - µ) 2 =

Mean, µ = ∑fX m / N = 273.5/17 = Variance,  2 = / 17 = Standard deviation,  =  2 =  = 7.25

Question a) The scores in a statistics test for 60 candidates are shown in the table. Find the mean, variance, and standard deviation for this population. ScoreFrequency

b) The following table showed the number of cars passed by UM hospital in a random sample of days. Compute the sample mean, variance and standard deviation. Number of carsNumber of days

3-3 Coefficient of Variation The coefficient of variation is defined to be the standard deviation divided by the mean. The result is expressed as a percentage. It is used to compare the standard deviation of different units. CVar s X orCVar  100%100%. =  

3-3 Chebyshev’s Theorem 75% of the values will lie within 2 standard deviations of the mean. Approximately 89% will lie within 3 standard deviations.

3-3 Empirical (Normal) Rule For any bell shaped distribution: Approximately 68% of the data values will fall within one standard deviation of the mean. Approximately 95% will fall within two standard deviations of the mean. Approximately 99.7% will fall within three standard deviations of the mean.

 -- 95%   

Question The scores on a national achievement exam have a mean of 480 and standard deviation of 90. If the scores are normally distributed, find the scores for approximately 68% of the data values, 95% of the data values, and 99.7% of the data values.

3-4 Measures of Position Measure the position of particular data in a data set. Z - score Percentile Decile Quartile

3-4 Z-Score The standard score or z score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol z is used for the z score. Z = (value - mean)/ standard deviation

The z score represents the number of standard deviations a data value falls above or below the mean. Forsamples z XX s Forpopulations z X :. :.     

A student scored 65 on a statistics exam that had a mean of 50 and a standard deviation of 10. Compute the z-score. z = (65 – 50)/10 = 1.5. That is, the score of 65 is 1.5 standard deviations above the mean. Above - since the z-score is positive. How about if the z-score shows a negative value?

3-4 Percentile Percentiles divide the distribution into 100 equal groups. P 1 P 2 P 3 P 4 ……………… P 98 P 99 P 100

How to find the Value Corresponding to a Given Percentile? Step 1: Arrange the data in order. Step 2: Compute c = (n  p)/100. c = position value of the required percentile p = percentile n = sample size

Step 3: If c is not a whole number, round up to the next whole number and find the corresponding value. If c is a whole number, use the value halfway between c and c+1.

Find the value of the 25th percentile for the following data set: 18, 12, 3, 5, 15, 8, 10, 2, 6, 20.

Is the data arranged in order? Arrange data in order form: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. n = 10, p = 25, so c = (10  25)/100 = 2.5. Since 2.5 is not a whole number, round up to c = 3. Thus, the value of the 25th percentile is the value X = 5.

3-4 Decile Deciles divide the data set into 10 groups. D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 D 9 D 10

What is the relationship between percentile and decile?  D 1 corresponding to P 10  D 2 corresponding to P 20  D 9 corresponding to P 90  D 10 corresponding to P 100

3-4 Quartile Quartiles divide the data set into 4 groups. Q 1 Q 2 Q 3 Q 4

What is the relationship between percentile and quartile?  Q 1 corresponding to P 25  Q 2 corresponding to P 50  Q 3 corresponding to P 75  Q 4 corresponding to P 100 Median

3-4 Outliers and the Interquartile Range (IQR) An outlier is an extremely high or an extremely low data value when compared with the rest of the data values. The Interquartile Range, IQR = Q 3 – Q 1.

Procedures to identify outliers To determine whether a data value can be considered as an outlier: Step 1: Arrange the data in order. Step 2: Compute Q 1 and Q 3. Step 3: Find the IQR = Q 3 – Q 1. Step 4: Compute (1.5)(IQR).

Step 5: Compute Q 1 – (1.5)(IQR) and Q 3 + (1.5)(IQR). Step 6: Compare the data value (say X) with Q 1 – (1.5)(IQR) and Q 3 + (1.5)(IQR). If X Q 3 + (1.5)(IQR), then X is considered an outlier.

Find Q 1. Q 1 is corresponding to P 25 c = (np)/100 = (825)/100 = 2 nd Since 2 is a whole number, take value between 2 nd and 3 rd. Q 1 = (6+12)/2 = 9

Find Q 3. Q 3 is corresponding to P 75 c = (np)/100 = (875)/100 = 6 th Since 6 is a whole number, take value between 6 th and 7 th. Q 3 = (18+22)/2 = 20

Find IQR. IQR = = 11. (1.5)(IQR) = (1.5)(11) = – 16.5 = – 7.5 and = The value of 50 is outside the range – 7.5 to 36.5, hence 50 is an outlier.

3-5 Exploratory Data Analysis - Stem and Leaf Plot A stem and leaf plot is a data plot that uses part of a data value as the stem and part of the data value as the leaf to form groups or classes. Leaf Stem

Example of Stem and Leaf Plot Stem Leaves

Stem Leading digit Leaf Trailing digit

At an outpatient testing center, a sample of 20 days showed the following number of cardiograms done each day: 25, 31, 20, 32, 13, 14, 43, 02, 57, 23, 36, 32, 33, 32, 44, 32, 52, 44, 51, 45. Construct a stem and leaf plot for the data.

Leading Digit (Stem) Trailing Digit (Leaf)

3-5 Exploratory Data Analysis - Box Plot When the data set contains a small number of values, a box plot is used to graphically represent the data set. These plots involve five values: 1. Minimum value 2. Q 1 3. Median 4. Q 3 5. Maximum value

Example of Box Plot Minimum valueMaximum value Q2Q2 Q1Q1 Q3Q3

Information Obtained from a Box Plot If the median is near the center of the box, the distribution is approximately symmetric. If the median falls to the left of the center of the box, the distribution is positively skewed. If the median falls to the right of the center of the box, the distribution is negatively skewed.

If the lines are about the same length, the distribution is approximately symmetric. If the right line is larger than the left line, the distribution is positively skewed. If the left line is larger than the right line, the distribution is negatively skewed.