Chapter 2 Descriptive Statistics

Slides:



Advertisements
Similar presentations
1 Chapter 2. Section 2-4. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY.
Advertisements

1 Chapter 2. Section 2-5. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY.
STATISTICS ELEMENTARY MARIO F. TRIOLA
Slide 1 Copyright © 2004 Pearson Education, Inc..
Calculating & Reporting Healthcare Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
1 Descriptive Statistics Frequency Tables Visual Displays Measures of Center.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Review and Preview and Frequency Distributions
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Describing Data: Numerical
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Copyright © 2004 Pearson Education, Inc.
Descriptive Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Numerical Descriptive Techniques
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Statistics Workshop Tutorial 3
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 2. Section 2-1 and 2-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Review Measures of central tendency
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Elementary Statistics Eleventh Edition Chapter 3.
Copyright © 2004 Pearson Education, Inc.. Chapter 2 Descriptive Statistics Describe, Explore, and Compare Data 2-1 Overview 2-2 Frequency Distributions.
Created by Tom Wegleitner, Centreville, Virginia Section 2-4 Measures of Center.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 2 Describing Data.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Measures of Center.
Section 3-3 Measures of Variation. WAITING TIMES AT DIFFERENT BANKS Jefferson Valley Bank (single waiting line) Bank of Providence.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Measures of Central Tendency Section 2-4 M A R I O F. T R I O L A Copyright ©
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Measures of Variance Section 2-5 M A R I O F. T R I O L A Copyright © 1998, Triola,
1 Chapter 2. Section 2-6. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Measures of Center. 2 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely.
1 Chapter 2. Section 2-5. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY.
1 M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY S TATISTICS Section 2-4 Measures of Center.
1 a value at the center or middle of a data set Measures of Center.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 2 Describing and Presenting a Distribution of Scores.
Honors Statistics Chapter 3 Measures of Variation.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Measures of Center.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Measures of Position Section 2-6
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Midrange (rarely used)
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
Overview Created by Tom Wegleitner, Centreville, Virginia
Lecture Slides Elementary Statistics Eleventh Edition
Chapter 2 Describing, Exploring, and Comparing Data
Presentation transcript:

Chapter 2 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data 2-3 Pictures of Data 2-4 Measures of Central Tendency 2-5 Measures of Variation 2-6 Measures of Position 2-7 Exploratory Data Analysis Review and Projects

Overview 2-1 Descriptive Statistics Inferential Statistics summarizes or describes the important characteristics of a known set of population data Inferential Statistics uses sample data to make inferences about a population

Important Characteristics of Data 1. Nature or shape of the distribution, such as bell-shaped, uniform, or skewed 2. Representative score, such as an average 3. Measure of scattering or variation

Summarizing Data With Frequency Tables 2-2 Summarizing Data With Frequency Tables Frequency Table lists categories (or classes) of scores, along with counts (or frequencies) of the number of scores that fall into each category

Axial Loads of 0.0109 in. Cans Table 2-1 270 278 250 290 274 242 269 257 272 265 263 234 273 277 294 279 268 230 262 273 201 275 260 286 272 284 282 278 268 263 285 289 208 292 279 276 242 258 264 281 262 278 265 241 267 295 283 209 276 273 263 218 271 289 223 217 225 292 270 204 265 271 273 283 275 276 282 270 256 268 259 272 269 251 208 290 220 277 293 254 223 263 274 262 200 272 268 206 280 287 257 284 279 252 215 281 291 276 285 297 290 228 274 277 286 251 278 289 269 267 276 206 284 268 291 293 280 282 230 275 236 295 289 283 261 262 252 277 204 286 270 278 272 281 288 248 266 256 292

Frequency Table of Axial Loads of Aluminum Cans 200 - 209 210 - 219 220 - 229 230 - 239 240 - 249 250 - 259 260 - 269 270 - 279 280 - 289 290 - 299 9 3 5 4 14 32 52 38

Frequency Table Definitions Class: An interval. Lower Class Limit: The left endpoint of a class. Upper Class Limit: The upper endpoint of a class. Class Mark: The midpoint of the class. Class width: the difference between the two consecutive lower class limits.

Definition values for the example 200 - 209 210 - 219 220 - 229 230 - 239 240 - 249 250 - 259 260 - 269 270 - 279 280 - 289 290 - 299 Score Frequency 9 3 5 4 14 32 52 38 Table 2-2 Lower Class Limits: 200, 210, … Upper class limits: 209,219 … Class Marks: 204.5=(200+209)/2,, 214.5, … Class width: 210-200=10.

Determine the Definition Values for this Frequency Table Classes Lower Class Limits Upper Class Limits Class Marks Class Width Quiz Scores Frequency 2 5 8 11 7 0 - 4 5 - 9 10 - 14 15 - 19 20 - 24

class width = round up of Constructing A Frequency Table 1. Decide on the number of classes. 2. Determine the class width by dividing the range by the number of classes (range = highest score – lowest score) and round up. range class width = round up of number of classes 3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score. 4. Add the class width to the starting point to get the second lower class limit. 5. List the lower class limits in a vertical column and enter the upper class limits. 6. Represent each score by a tally mark in the appropriate class. Total tally marks to find the total frequency for each class.

Guidelines For Frequency Tables 1. Classes should be mutually exclusive. 2. Include all classes, even if the frequency is zero. 3. Try to use the same width for all classes. 4. Select convenient numbers for class limits. 5. Use between 5 and 20 classes. 6. The sum of the class frequencies must equal the number of original data values.

Relative Frequency Table class frequency sum of all frequencies

Relative Frequency Table 200 - 209 210 - 219 220 - 229 230 - 239 240 - 249 250 - 259 260 - 269 270 - 279 280 - 289 290 - 299 Score Frequency 9 3 5 4 14 32 52 38 Table 2-2 Axial Load Relative 0.051 0.017 0.029 0.023 0.080 0.183 0.297 0.217 0.080- Table 2-3 9 = .051 175 3 = .017 175 5 = .029 175

Cumulative Frequency Table Axial Load Cumulative Frequency Score Frequency 200 - 209 210 - 219 220 - 229 230 - 239 240 - 249 250 - 259 260 - 269 270 - 279 280 - 289 290 - 299 9 3 5 4 14 32 52 38 Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 Cumulative Frequencies

Frequency Tables Table 2-2 Table 2-3 Table 2-4 Axial Load Relative Cumulative Frequency Score Frequency Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 200 - 209 210 - 219 220 - 229 230 - 239 240 - 249 250 - 259 260 - 269 270 - 279 280 - 289 290 - 299 9 3 5 4 14 32 52 38 200 - 209 210 - 219 220 - 229 230 - 239 240 - 249 250 - 259 260 - 269 270 - 279 280 - 289 290 - 299 0.051 0.017 0.029 0.023 0.080 0.183 0.297 0.217 0.08-

Mean as a Balance Point Mean FIGURE 2-7

Notation µ is pronounced ‘mu’ and denotes the mean of all values S denotes the summation of a set of values x is the variable usually used to represent the individual data values n represents the number of data values in a sample N represents the number of data values in a population x is pronounced ‘x-bar’ and denotes the mean of a set of sample values µ is pronounced ‘mu’ and denotes the mean of all values in a population

Calculators can calculate the mean of data Definitions Mean the value obtained by adding the scores and dividing the total by the number of scores S x Sample x = n S x µ = Population N Calculators can calculate the mean of data

Definitions Median often denoted by x (pronounced ‘x-tilde’) the middle value when scores are arranged in (ascending or descending) order ~ often denoted by x (pronounced ‘x-tilde’) is not affected by an extreme value

no exact middle -- shared by two numbers 5 5 5 3 1 5 1 4 3 5 2 1 1 2 3 3 4 5 5 5 5 5 (in order) exact middle MEDIAN is 4 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers MEDIAN is 4.5 4 + 5 = 4.5 2

Definitions Mode Bimodal Multimodal No Mode the score that occurs most frequently Bimodal Multimodal No Mode the only measure of central tendency that can be used with nominal data

Examples a. 5 5 5 3 1 5 1 4 3 5 b. 2 2 2 3 4 5 6 6 6 7 9 c. 2 3 6 7 8 9 10 Mode is 5 Bimodal No Mode

Examples a. 5 5 5 3 1 5 1 4 3 5 b. 2 2 2 3 4 5 6 6 6 7 9 c. 2 3 6 7 8 9 10 Mode is 5 Bimodal No Mode d. 2 2 3 3 3 4 e. 2 2 3 3 4 4 5 5 Mode is 3 No Mode

highest score + lowest score Definitions Midrange the value halfway between the highest and lowest scores highest score + lowest score Midrange = 2

measures of central tendency Round-off rule for measures of central tendency Carry one more decimal place than is present in the orignal set of data

An Example of Skewness Symmetric Dataset 1: 3, 4, 4, 5, 5, 5, 6, 6, 7 Mean = 5, Median = 5 Dataset 2: 3, 4, 4, 5, 5, 5, 7, 7 ,9. Mean=5.444, Median = 5. Skewed right Dataset 3: 2, 3, 3, 5, 5, 5, 6, 6, 7. Mean = 4.667, Median = 5. Skewed left

Skewness SYMMETRIC SKEWED LEFT SKEWED RIGHT (negatively) (positively) Figure 2-8 (b) Mode = Mean = Median SYMMETRIC Mean Mode Mode Mean Median Median Figure 2-8 (a) SKEWED LEFT (negatively) SKEWED RIGHT (positively) Figure 2-8 (c)

Best Measure of Central Tendency Table 2-6 Advantages - Disadvantages

Mean from a Frequency Table use class mark of classes for variable x S (f • x) x = Formula 2-2 S f x = class mark f = frequency S f = n

Mean of this frequency table =14.4 Quiz Scores Frequency Class Marks 0 - 4 5 - 9 10 - 14 15 - 19 20 - 24 2 5 8 11 7 2 7 12 17 22 Mean of this frequency table =14.4

Measure of Variation Range score highest lowest score

(average deviation from the mean) Measure of Variation Standard Deviation a measure of variation of the scores about the mean (average deviation from the mean)

Sample Standard Deviation Formula S (x – x)2 S = n – 1 Formula 2 -4 calculators can calculate sample standard deviation of data

Find the standard deviation of the sample data: 2, 3, 4, 5, 5, 5. S2 = 8/5=1.6, S=1.26. Use the shortcut formula to find the standard deviations of the above data, and the waiting times at the two banks. 1) S x2 =104, 2) Jefferson Valley Bank: S x2 =513.27, S x =71.5, s=0.48. 3) Bank of Providence: S x2 =541.09, S x =71.5, s=1.82.

Population Standard Deviation S (x – µ) 2 s = N calculators can calculate the population standard deviation of data

for Standard Deviation Symbols for Standard Deviation Sample Population s Sx xsn–1 s s x xsn Textbook Book Some graphics calculators Some graphics calculators Some nongraphics calculators Some nongraphics calculators

standard deviation squared Measure of Variation Variance standard deviation squared s } 2 use square key on calculator Notation 2

Variance S (x – x)2 n – 1 S (x – µ)2 s2 = N s2 = Sample Variance Population Variance N

Round-off Rule for measures of variation Carry one more decimal place than was present in the original data

Standard Deviation Shortcut Formula n (S x2) – (S x)2 s = n (n – 1) Formula 2 - 6

Standard deviation gets larger as spread of data increases. FIGURE 2-10 Same Means (x = 4) Different Standard Deviations s = 0 1 2 3 4 5 6 7 s = 3.0 s = 0.8 s = 1.0 Frequency Standard deviation gets larger as spread of data increases.

(applies to bell shaped distributions) FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 68% within 1 standard deviation 0.340 0.340 x – s x x + s

(applies to bell shaped distributions) FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.135 0.135 x – 2s x – s x x + s x + 2s

The Empirical Rule x – 3s x – 2s x – s x x + s x + 2s x + 3s FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.024 0.024 0.001 0.001 0.135 0.135 x – 3s x – 2s x – s x x + s x + 2s x + 3s

Range Rule of Thumb s » Range » 4s Range 4 (maximum) (minimum) or x + 2s (maximum) (minimum) x – 2s x Range » 4s or s » Range 4

Chebyshev’s Theorem applies to distributions of any shape the proportion (or fraction) of any set of data lying within k standard deviations of the mean is always at least 1 – 1/k2, where k is any positive number greater than 1.

Measures of Variation Summary For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.

An application of measure of variation There are two brands, A, B or car tires. Both have a mean life time of 60,000 miles, but brand A has a standard deviation on lifetime of 1000 miles and Brand B has a standard deviation on lifetime of 3000 miles. Which brand would you prefer?

divides ranked scores into four equal parts Quartiles Q1, Q2, Q3 divides ranked scores into four equal parts 25% 25% 25% 25% Q1 Q2 Q3

Percentiles 99 Percentiles

Sorted Axial Loads of 175 Aluminum Cans Finding the Percentile of a Given Score number of scores less than x Percentile of score x = • 100 total number of scores Sorted Axial Loads of 175 Aluminum Cans [1] 200 201 204 204 206 206 208 208 209 215 217 218 220 223 223 [16] 225 228 230 230 234 236 241 242 242 248 250 251 251 252 252 [31] 254 256 256 256 257 257 258 259 259 260 261 262 262 262 262 [46] 262 263 263 263 263 263 264 265 265 265 266 267 267 268 268 [61] 268 268 268 268 268 268 268 269 269 269 269 270 270 270 270 [76] 270 270 270 270 271 271 272 272 272 272 272 273 273 273 273 [91] 273 273 274 274 274 274 275 275 275 275 276 276 276 276 276 [106] 277 277 277 277 277 277 277 277 278 278 278 278 278 278 278 [121] 279 279 279 280 280 280 281 281 281 281 282 282 282 282 282 [136] 282 283 283 283 283 283 283 284 284 284 284 285 285 285 286 [151] 286 286 286 287 287 288 289 289 289 289 289 290 290 290 291 [166] 291 292 292 292 293 293 294 295 295 297

Finding the Value of the kth Percentile Start Rank the data. (Arrange the data in order of lowest to highest.) Finding the Value of the kth Percentile Compute L = n where n = number of scores k = percentile in question ) ( k 100 The value of the kth percentile is midway between the Lth score and the highest score in the original set of data. Find Pk by adding the L th score and the next higher score and dividing the total by 2. Is L a whole number ? Yes No Change L by rounding it up to the next larger whole number. The value of Pk is the Lth score, counting from the lowest

Sorted Axial Loads of 175 Aluminum Cans [1] 200 201 204 204 206 206 208 208 209 215 217 218 220 223 223 [16] 225 228 230 230 234 236 241 242 242 248 250 251 251 252 252 [31] 254 256 256 256 257 257 258 259 259 260 261 262 262 262 262 [46] 262 263 263 263 263 263 264 265 265 265 266 267 267 268 268 [61] 268 268 268 268 268 268 268 269 269 269 269 270 270 270 270 [76] 270 270 270 270 271 271 272 272 272 272 272 273 273 273 273 [91] 273 273 274 274 274 274 275 275 275 275 276 276 276 276 276 [106] 277 277 277 277 277 277 277 277 278 278 278 278 278 278 278 [121] 279 279 279 280 280 280 281 281 281 281 282 282 282 282 282 [136] 282 283 283 283 283 283 283 284 284 284 284 285 285 285 286 [151] 286 286 286 287 287 288 289 289 289 289 289 290 290 290 291 [166] 291 292 292 292 293 293 294 295 295 297 The 10th percentile: L=175*10/100=17.5, round up to 18. So the 10th percentile is the 18th one in the sorted data, i.e., 230. The 25th percentile: L=175*25/100=43.52, rounded up to 44. The 25th percentile is the 44th one in the sorted data, I.ei. 262.

Interquartile Range: Q3 – Q1 Semi-interquartile Range: Midquartile: 2 Q1 + Q3 2

Exploratory Data Analysis Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs

Exploratory Data Analysis Traditional Statistics Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs Traditional Statistics Used to confirm final conclusions about data Typically requires some very important assumptions about the data Calculations are often complex, and graphs are often unnecessary

Boxplots Box-and-Whisker Diagram 5 - number summary Minimum first quartile Q1 Median third quartile Q3 Maximum

Boxplots Box-and-Whisker Diagram 60 68.5 78 52 90 Figure 2-13 Boxplot of Pulse Rates (Beats per minute) of Smokers

Figure 2-14 Boxplots Normal Uniform Skewed

Values that are very far away from most of the data Outliers Values that are very far away from most of the data

Class Survey Data Boxplots for the heights of those who never broke a bone and those who did

When comparing two or more boxplots, it is necessary to use the same scale. 40 50 60 70 80 90 100 PULSE 1 2 (yes) SMOKE (No)