MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019
4/20/2019
Data Summary Descriptive Measures A descriptive measure is a single number computed from the sample data that provides information about the data. The four types of descriptive measures are: Measures of central tendency – Where is the middle of my data? Measures of variation – How spread out are my data? Measures of position – Where does my value fall within the distribution? Measures of shape – Are my data symmetric? 4/20/2019
Measures of Central Tendency Mean - average Median – center value Midrange – half-way point Mode – most frequently occurring value 4/20/2019
Data Summary Descriptive Measures A descriptive measure is a single number computed from the sample data that provides information about the data. The four types of descriptive measures are: Measures of central tendency – Where is the middle of my data? Measures of variation – How spread out are my data? Measures of position – Where does my value fall within the distribution? Measures of shape – Are my data symmetric? 4/20/2019
Measures of Variation A single statistic is generally not sufficient to completely describe the distribution of the data Some data are more homogeneous than others Variation is the tendency of the data values to scatter about the sample mean 4/20/2019
Measures of Variation Range – distance between the smallest and largest values Variance – describes the variation of the sample values around the sample mean Standard Deviation – square root of the variance Coefficient of Variation (CV) – measure of the standard deviation relative to the mean 4/20/2019
Range The range is the numerical distance between the smallest and largest value in the data set As such, the range is highly influenced by outliers in the data Range = H - L 4/20/2019
Range Data 37 40 28 H = 42, L=28 Range = H – L = 42 – 28 = 14 35 36 42 30 31 33 39 H = 42, L=28 Range = H – L = 42 – 28 = 14 4/20/2019
Range Data 37 40 28 H = 58, L=28 Range = H – L = 58 – 28 = 30 35 36 42 31 33 39 58 H = 58, L=28 Range = H – L = 58 – 28 = 30 4/20/2019
Variance The variance measures the spread (or variation) of the sample values about the sample mean. To calculate the variance Compute the distance from each sample value to the sample mean Square the distances Sum up the squared distances Divide by (n-1) where n is the sample size 𝑠 2 = (𝑥− 𝑥 ) 2 𝑛−1 4/20/2019
Example 4/20/2019
Standard Deviation The standard deviation also measures the spread of the sample values about the sample mean. However, the unit of measure for the standard deviation is the same as the unit of measure for the data, so it is easier to interpret than the variance. The standard deviation is simply the square root of the variance s= (𝑥− 𝑥 ) 2 𝑛−1 4/20/2019
Additional Comments Because the units of measure are the same for the mean and the standard deviation, we can use the information to ask questions such as: “How many of the sample values are within two standard deviations of the mean?” The magnitude of the variance (or standard deviation) is relative the magnitude of the data. 4/20/2019
Coefficient of Variation To compare the variation across different samples, a relative measure of variation is needed The coefficient of variation (CV) provides a measure of the standard deviation as a percentage of the mean 𝐶𝑉= 𝑠 𝑥 4/20/2019
Levi’s Example PT1 PT2 PT3 PT4 PT5 Mean 4.52 8.83 4.83 7.49 10.37 Median 1.95 6.15 4.7 7.1 11.3 Std. Deviation 10.03 15.35 4.4 3.66 9.56 C.V. 2.22 1.74 0.91 0.49 0.92 4/20/2019
Levi’s Example PT1 PT2 PT3 PT4 PT5 Mean 4.52 8.83 4.83 7.49 10.37 Median 1.95 6.15 4.7 7.1 11.3 Std. Deviation 10.03 15.35 4.4 3.66 9.56 C.V. 2.22 1.74 0.91 0.49 0.92 4/20/2019
Data Summary Descriptive Measures A descriptive measure is a single number computed from the sample data that provides information about the data. The four types of descriptive measures are: Measures of central tendency – Where is the middle of my data? Measures of variation – How spread out are my data? Measures of position – Where does my value fall within the distribution? Measures of shape – Are my data symmetric? 4/20/2019
Measures of Position Measures of Position are indicators of how a particular value fits in with all the other data values Percentiles (and quartiles) Z-score 4/20/2019
Percentiles The pth percentile is the value such that exactly p% of the data are less than that value “She scored in the 95th percentile of all students taking the test. In other words, 95% of the students taking the test had scores lower than hers. Only 5% of the students scored better than she did.” The 50th percentile is also known as the median. 4/20/2019
Quartiles The 25th, 50th, and 75th percentiles are also referred to as the quartiles of the distribution Q1 = 1st quartile = 25th percentile Q2 = 2nd quartile = 50th percentile Q3 = 3rd quartile = 75th percentile The quartiles will be important for constructing box and whisker plots. Interquartile range (IQR) IQR = Q3 – Q1 The IQR is the range that contains the middle 50% of the data 4/20/2019
Z-Score The Z-Score expresses the position of a data value, x, in terms of the number of standard deviations above or below the mean. 𝑧= 𝑥 − 𝑥 𝑠 4/20/2019
Data Summary Descriptive Measures A descriptive measure is a single number computed from the sample data that provides information about the data. The four types of descriptive measures are: Measures of central tendency – Where is the middle of my data? Measures of variation – How spread out are my data? Measures of position – Where does my value fall within the distribution? Measures of shape – Are my data symmetric? 4/20/2019
Skewness The Skewness coefficient (Sk)measures the degree of skewness in your data Value of Sk ranges from -3 to 3. Sk = 0 for perfectly symmetric distributions Sk<0 the data are skewed left Sk>0 the data are skewed right 4/20/2019
Kurtosis Kurtosis measures the “peakedness” of your distribution The value of the kurtosis is large if there is a high frequency of observations near the mean and in the tails of the distribution 4/20/2019
Chebyshev’s Inequality For Any Data Set: At least 75% of the data values are between 𝑥 −2𝑠 and 𝑥 +2𝑠 . At least 75% of the data values have a z-score between -2 and 2. At least 89% of the data values are between 𝑥 −3𝑠 and 𝑥 +3𝑠 . At least 89% of the data values have a z-score between -3 and 3. In general, at least (1− 1 𝑘 2 )𝑥100% of your data values lie between 𝑥 −𝑘𝑠 and 𝑥 +𝑘𝑠 (have z-scores between –k and k) for any k>1 . 4/20/2019
Empirical Rule For data that have a normal distribution (bell-shaped distribution with skewness near 0): Approximately 68% of the data values are between 𝑥 −𝑠 and 𝑥 +𝑠 . Approximately 95% of the data values are between 𝑥 −2𝑠 and 𝑥 +2𝑠 . Approximately 99.7% of the data values are between 𝑥 −3𝑠 and 𝑥 +3𝑠 . 4/20/2019
Box Plots A Box Plot is a graphical representation of a set of sample data that illustrates the lowest data value (L), the first quartile (Q1), the median (Q2), the third quartile (Q3), the interquartile range, and the highest value (H). 4/20/2019
Box Plots 4/20/2019
Box Plots 4/20/2019
Box Plots 4/20/2019
Describing Bivariate Data x x x 4/20/2019
Describing Bivariate Data x x x x x x x x x x x x x x x 4/20/2019
Describing Bivariate Data x x x x x x x x x x 4/20/2019
Correlation The Pearson coefficient of correlation (r) measures the strength of the linear relationship between two variables and has the following important properties: r ranges from -1 to 1 The larger |r| is, the stronger the linear relationship r near zero indicates that there is no linear relationship r=1 or r=-1indicates a perfect linear relationship The sign of r tells you whether the relationship is positive or negative (e.g., as x increases, does y increase or decrease) http://www.people.vcu.edu/~rjohnson/regression/ 4/20/2019
Describing Bivariate Data x x x x x x x x x x x x x x x x 4/20/2019
Understanding Stock Rates 4/20/2019
Correlation does not imply causation! 4/20/2019