Download presentation
Presentation is loading. Please wait.
Published bySharon Angelica Fields Modified over 7 years ago
1
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Kenneth R. Martin Lecture 7 October 12, 2016 Confidential - Kenneth R. Martin
2
Confidential - Kenneth R. Martin
Agenda Housekeeping Readings Exam #1 review Chapter 1, 14, 10, 2, & 3 Confidential - Kenneth R. Martin
3
Confidential - Kenneth R. Martin
Housekeeping Read, Chapter 1.1 – 1.4 Read, Chapter 14.1 – 14.2 Read, Chapter 10.1 Read, Chapter 2 Read, Chapter 3 Confidential - Kenneth R. Martin
4
Confidential - Kenneth R. Martin
Housekeeping Exam #1 Review Confidential - Kenneth R. Martin
5
Statistics – Application to Research
Confidential - Kenneth R. Martin
6
Confidential - Kenneth R. Martin
Statistics Why collect samples ? Often impractical to collect all the data from the entire population (i.e. U.S. census). Some test methods are destructive – we wouldn’t have any products or services left to ship to a customer! Too expensive to sample the entire population. Don’t have to collect 100% of the population ! We can use inferential statistics to make sound conclusions about the population. Population and Sample Sampling Scheme POPULATION SAMPLE Measure Data! Use data from the SAMPLE to make conclusions about the POPULATION Confidential - Kenneth R. Martin
7
Confidential - Kenneth R. Martin
Statistics Describing the Data Two methods to summarize the data: Graphical - Histogram Analytical - Central Tendency Confidential - Kenneth R. Martin
8
Confidential - Kenneth R. Martin
Statistics Central Tendency A statistical measure which describes how the data is distributed around its central value: which includes the Mean, Median, and Mode. However, Central Tendency does not tell about data Variation / spread. Confidential - Kenneth R. Martin
9
Confidential - Kenneth R. Martin
Statistics Relationship of Central Tendency *** Normal distribution: Mean = Median = Mode Confidential - Kenneth R. Martin
10
Confidential - Kenneth R. Martin
Statistics Frequency Distributions Confidential - Kenneth R. Martin
11
Confidential - Kenneth R. Martin
Statistics Various curves (Different data spreads, common means) Confidential - Kenneth R. Martin
12
Confidential - Kenneth R. Martin
Statistics Various curves (Different means, common data spreads) Confidential - Kenneth R. Martin
13
Confidential - Kenneth R. Martin
Statistics Various Normal Curves Confidential - Kenneth R. Martin
14
Confidential - Kenneth R. Martin
Statistics Measures of Variability - how the data is spread from it’s central value The central tendency does not indicate any levels of variability (dispersion) from the mean. A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} The mean & median of this data are all the same, but the variability of data is different in all data sets. Confidential - Kenneth R. Martin
15
Confidential - Kenneth R. Martin
Statistics Measures of Variability: Can be values from 0 to ∞ (infinity) 0 means no variability of data A large value indicates lots of variability of data Values can never be negative As soon as one value in a data set differs from another, variability exists Confidential - Kenneth R. Martin
16
Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Range Range (R) = Max. value – Min. value = X H – X L As data set size , the accuracy of using range . Limit the usage of Range to ~ 10 readings. Confidential - Kenneth R. Martin
17
Confidential - Kenneth R. Martin
Statistics Measures of Variability – Range Example A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} RA = ? RB = ? RC = ? Confidential - Kenneth R. Martin
18
Confidential - Kenneth R. Martin
Statistics Measures of Variability So what is the limitation of all three of these Range calculations ? Confidential - Kenneth R. Martin
19
Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Variance Variance: a measure of the variability of the average squared distance that data points deviate from their mean. Variance calculations include all data points. Confidential - Kenneth R. Martin
20
Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Variance Sum of Squares (SS): the sum (addition) of the squared deviations of values from their mean. The SS is the numerator of the variance formula. Variance, 2 , for the Population. μ is the population average Variance, S2, for a Sample. M is the sample average. Confidential - Kenneth R. Martin
21
Confidential - Kenneth R. Martin
Statistics Variance - Example A = {100, 200, 300, 400, 500} In this case, notice that the SS of both the population and the sample will be the same Remember: PREMDAS What is 2 ? What is S2 ? Confidential - Kenneth R. Martin
22
Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - Variance What is a big limitation with Variance ? What do you notice about the units of the mean, and the units of Variance ? Confidential - Kenneth R. Martin
23
Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Standard Deviation Also called the Root Mean Square deviation, it is a measure of the spread of the variability of the data; the average distance data deviate from their mean. Calculated by taking the square root of the Variance Confidential - Kenneth R. Martin
24
Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Standard Deviation When the data comes from the “population”, we shall use “” (sigma) to denote the Standard Deviation. The mean value will be represented by the Greek symbol (mu) The denominator does not have “uncertainty”, thus N When the data comes from a “sample”, we shall use “SD” to denote the Standard Deviation. The mean value will be represented by M or X ( X-bar) The denominator shows “uncertainty”, thus n-1 Confidential - Kenneth R. Martin
25
Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Standard Deviation We typically always want the standard deviation (variance) value to be as small as possible. We typically want to minimize variability ! Standard deviation is always a better measure to precisely describe the data distribution versus range. Other formulas exist for Standard Deviation, but will not be covered. Confidential - Kenneth R. Martin
26
Confidential - Kenneth R. Martin
Statistics Standard Deviation - Example A = {100, 200, 300, 400, 500} What do we notice about the units of Standard Deviation and the units of the mean ? The Mean and Standard Deviation are typically reported together. Confidential - Kenneth R. Martin
27
Confidential - Kenneth R. Martin
Statistics Standard Deviation - Example B = {50, 150, 300, 450, 550} Confidential - Kenneth R. Martin
28
Confidential - Kenneth R. Martin
Statistics Measures of Dispersion – Coefficient of Variation CVar – Allows a comparison of standard deviations when the units of measure are not the same Confidential - Kenneth R. Martin
29
Confidential - Kenneth R. Martin
Statistics Coefficient of Variation - Example Confidential - Kenneth R. Martin
30
Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot – Boxplot Simple graphical tool to summarize data. Need to determine 5 values (five-number summary) from data, to generate a boxplot: Median (2nd Quartile) Maximum data value Minimum data value 1st Quartile (values below 1/4 observations)[whisker end] 3rd Quartile (values below 3/4 observations)[whisker end] Confidential - Kenneth R. Martin
31
Confidential - Kenneth R. Martin
Statistics Process aim = 9.0 minutes Spec = + / minutes n = 125 R = 1.7 Box and Whisker Plot – Boxplot Example Confidential - Kenneth R. Martin
32
Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example Inside box is the median value, and approximately 50% of observations Whiskers extend from the box to extreme values Example: Median; n=125: Median = 63rd value = 9.8 Max = 10.7 Min = 9.0 1st Quartile = X 125 * 0.25 ~ X Avg 31 & 32 value = 9.6 3rd Quartile = X 125 * 0.75 ~ X Avg 94 & 95 value = 10.0 Confidential - Kenneth R. Martin
33
Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example Long Whiskers denote the existence of values much larger than other values. For this example, mean median. Other variants exist, i.e. + / - 1.5*IQR [whisker ends], all other points are “outliers” as depicted as asterisks IQR = Inner Quartile Range Q1 Q3 Q2 Confidential - Kenneth R. Martin
34
Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example Confidential - Kenneth R. Martin
35
Confidential - Kenneth R. Martin
Statistics Measures of Variability (Dispersion) - IQR IQR – Interquartile Range IQR = Q3 – Q1 Confidential - Kenneth R. Martin
36
Confidential - Kenneth R. Martin
Statistics Box and Whisker Plot - Boxplot Example For this example, IQR = ? Q1 Q3 Q2 Confidential - Kenneth R. Martin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.