Download presentation
Presentation is loading. Please wait.
Published byWalter Page Modified over 9 years ago
1
Descriptive Statistics Prerequisite Material MGS 8110 Regression & Forecasting
2
L00D MGS 8110 - Descriptive Statistics 2 Descriptive Analysis of Data Given a bunch of data (simple data for only one variable), how is the best way to summarize the data?
3
L00D MGS 8110 - Descriptive Statistics 3 Measures of Central Tendency Mean Median if n is odd if n is even Mode the x i value that occurs most frequently Mid Range Mid InterQuartile
4
L00D MGS 8110 - Descriptive Statistics 4 Measures of Variability Av Deviate Av Absolute Deviate Variance Standard Deviation Range Inter Quartile Range (IQR)
5
5 Lets Review - Statistical Precepts Use the Mean for quantitative, symmetric data. Use the Medium for quantitative non-symmetric data. Use the mode for categorical data. Use the Variance when doing calculations. Use the Standard Deviation when presenting the results of the calculations. Major Teaching Points are frequently shown in green boxes L00D MGS 8110 - Descriptive Statistics
6
6 Appropriate Statistics discussed in chapter 2
7
7 More Review – Need to memorize formules L00D MGS 8110 - Descriptive Statistics Used for calculation, but not for presentations. Units are squared (e.g., inches squared). Used for presentations. Common units (e.g., inches). Divide by n-1 instead of n in order to get an unbiased estimate.
8
L00D MGS 8110 - Descriptive Statistics 8 Interpretation of Standard Deviation (1 of 2) If the data is normally distributed Statistical Precepts Two-thirds of the data is contained in one sigma. 95% of the data is contained in two sigma. Almost all of the data is contained in three sigma.
9
L00D MGS 8110 - Descriptive Statistics 9 Interpretation of Standard Deviation (2 of 2) If ever asked to explain what the Standard Deviation means, say “two-thirds of the data will be within plus or minus one Standard Deviation from the mean”. If ever asked for the “worst case” or “best case” outcome calculate “mean – (2)sigma” and/or “mean + (2)sigma”. Statistical Precepts Definition of Standard Deviation - two-thirds of the data is contained in a range of values that are two sigma wide. Worst case outcome is – . Best case outcome is + 2 .
10
L00D MGS 8110 - Descriptive Statistics 10 Other Measures Percentilesx p is the x i such that (measure of tails) p% of the x i < x p Quartilesare percentile where p = 25, 50 or 75 (measure of tails) the lower, middle or upper quartile.
11
L00D MGS 8110 - Descriptive Statistics 11 Other Measures Coefficient of Variation (percentage measure of variability) Correlation Coefficient (measure of linear association)
12
L00D MGS 8110 - Descriptive Statistics 12 Interpretation of Correlation Coefficient n=25 =0 =.8 =1
13
L00D MGS 8110 - Descriptive Statistics 13 Interpretation of Correlation Coefficient n=25 =0 =-.8 =-1
14
L00D MGS 8110 - Descriptive Statistics 14 Interpretation of Correlation Coefficient
15
L00D MGS 8110 - Descriptive Statistics 15 Interpretation of Correlation Coefficient (1 of 2) Statistical tests of correlation coefficients are relatively meaningless. These tests are based on the hypothesis that “ = ”. Based on the previous graph, knowing that a correlation coefficient is greater than zero is not necessarily a valuable piece of information. In terms of “Practical Significance” (compared to “Statistical Significance”), the correlation coefficient has to be at least greater than.5. From the previous graph it can be seen that = only explains 25% of the variability in the data. Statistical Precept must be greater than.5 to be of Practical Significance
16
L00D MGS 8110 - Descriptive Statistics 16 Other Measures (page 3 of 3) Skew (measure of symmetry) Kurtosis (measure of peakedness)
17
L00D MGS 8110 - Descriptive Statistics 17 Verifying Bell Shape (Normal Distribution) Negative Skew if the distribution has a ‘long’ tail to the left, measured as skewness -1 Positive Skew if the distribution has a ‘long’ tail to the right (more common situation), measured as skewness +1 Symmetric if -1 skewness +1 Peaked Distribution if Kurtosis is a large positive number ( +1). Flat Distribution if Kurtosis is a large negative number ( -1). Normal shape (proportionally S-shaped sides) if Kurtosis near zero.
18
L00D MGS 8110 - Descriptive Statistics 18 Verifying Bell Shape (Normal Distribution) Statistical Precept Bell shaped (Normally distributed) if -1 skewness +1 and if -1 kurtosis +1
19
L00D MGS 8110 - Descriptive Statistics 19 How is the best way to summarize data? (our original question) Central Tendency (Mean, Median & Mode) Variability (Variance & Standard Deviation) Shape (Percentiles, Skewness & Kurtosis) Association (correlation)
20
L00D MGS 8110 - Descriptive Statistics 20 Notations
21
L00D MGS 8110 - Descriptive Statistics 21 Standard Deviation of Sample Mean Called the “Standard Error” of the Mean
22
L00D MGS 8110 - Descriptive Statistics 22 Insert / Function examples (1 of 3) Mean Average(A1:A10) Median Median(A1:A10) Mode Mode(A1:A10) Mid Range( MAX(A1:A10) + MIN(A1:A10) ) / 2 InterQuartile( Quartile(A1:A10,1) + Quartile(A1:A10,3) ) / 2
23
L00D MGS 8110 - Descriptive Statistics 23 Insert / Function examples (2 of 3) Av DeviateNA Av Absolute DeviateAveDev(A1:A10) VarianceVar(A1:A10) Standard DeviationStDev(A1:A10) RangeMAX(A1:A10) - MIN(A1:A10) Inter Quartile Range (IQR) Quartile(A1:A10,3) - Quartile(A1:A10,1)
24
L00D MGS 8110 - Descriptive Statistics 24 Insert / Function examples (3 of 3) PercentilesPercentile(A1:A10,.05) QuartilesQuartile(A1:A10,q) where q=0,1,2,3,4 Coef. of VariationStDev(A1:A10)/Average(A1:A10) CorrelationCorrel(A1:A10,B1:B10) SkewSkew(A1:A10) KurtosisKurt(A1:A10)
25
L00D MGS 8110 - Descriptive Statistics 25 Example Calculations Q. Should I use the Mean or the Median to state the Central value of this data? -0.31=SKEW(Height) -1.27=KURT(Height) 0.17=SKEW(Weight) -1.03=KURT(Weight) Answer – Both variables have a somewhat peaked distributions (Kurtosis greater than 1), but both variables have very symmetric distributions (non-skewed distribution); hence, use Mean.
26
L00D MGS 8110 - Descriptive Statistics 26 Example Calculations Q. The Standard Deviation for Height is almost 2 inches, what is the practical interpretation of this value? Answer – The height of 2/3 of the population will vary by less than 4 inches (3.87”). 67.1=B14-B15 71.0=B14+B15 3.87=F10-F9
27
L00D MGS 8110 - Descriptive Statistics 27 Example Calculations Q. What is the height of the shortest person and the tallest person that I may meet today (worst case and best case)? Answer – The shortest person will be 5’-5” (65.2”) and the tallest person will be 6’-1” (72.9”). 65.2=AvHt-2*StDevHt 72.9=AvHt+2*StDevHt
28
L00D MGS 8110 - Descriptive Statistics 28 Example Calculations Q. What is the height of the shortest person that I may meet over the next year? Answer – The shortest person that I am likely to meet in the foreseeable future will be 5’-6” (66.0”). 66.0=PERCENTILE(Height,0.01)
29
L00D MGS 8110 - Descriptive Statistics 29 Example Calculations Q. The answers to the two previous questions are not consistent. The 5% values calculated as Mean – 2(Sigma) was 5’-5” where as the 1% value calculated as a Percentile was 5’-6”. Answer – These types of inconsistencies (i.e., errors) will occur with small samples. The procedure used by the PERCENTILE function is based on an interpolated calculation with the two smallest values in the sample.
30
L00D MGS 8110 - Descriptive Statistics 30 Example Calculations Q. Which variable, Height or Weight, has the greatest relative variability? Answer – In agreement with our intuition, Weight is 3 to 4 times more variable than height (11/3 = 3.67). Coef of Variation 3%=StDevHt/AvHt 11%=StDevWt/AvWt
31
L00D MGS 8110 - Descriptive Statistics 31 Example Calculations Q. Is there a relationship between Height and Weight and if so how large is the relationship? Answer – The correlation between Height and Weight is.78 which means that about 60% (.61) of the variability in weight is due to differences in Height. 0.783=CORREL(Height,Weight) 0.61=G23^2
32
L00D MGS 8110 - Descriptive Statistics 32 Example Calculations Q. Given that there is a relationship between Height and Weight, is the relationship linear or non- linear? Answer – Simple statistics cannot be used to determine linear versus non-linear, would need to plot the data. The correlation indicates that there is a relatively strong linear relationship, but a plot of the data (Weight vs. Height) may indicate that there is an even stronger non-linear relationship 0.783=CORREL(Height,Weight) 0.61=G23^2
33
L00D MGS 8110 - Descriptive Statistics 33 Example Calculations Q. Are Height and Weight Normally distributed? Answer – Based on out Rule-of-Thumb test (-1 < Skew < +1 and -1 < Kurt < +1), neither of these variables are normally distributed. -0.31=SKEW(Height) -1.27=KURT(Height) 0.17=SKEW(Weight) -1.03=KURT(Weight)
34
L00D MGS 8110 - Descriptive Statistics 34 Example Calculations Q. Given that the variables are NOT Normally distributed, why do I care? Answer – You previous interpretation of the Standard Deviation maybe somewhat inaccurate (“The height of 2/3 of the population will vary by less than 4 inches “). Also, you previous interpretation of Worst Case and Best Case maybe somewhat inaccurate (“The shortest person will be 5’-5” and the tallest person will be 6’-1”).
35
L00D MGS 8110 - Descriptive Statistics 35 Example Calculations Q. The average Height is estimated to be 69.1”, how good is that estimate? Answer – The true average height could be anywhere between 67.8 inches to 70.3 inches. A better estimate could be obtained if a large sample was available. 67.8=AvHt-2*StErrorHt 70.3=AvHt+2*StErrorHt
36
L00D MGS 8110 - Descriptive Statistics 36 More about Variability Use StDev (or Var) in Excel Use StDevP (or VarP) in Excel Alternative formulation 1) if every item in the Universe is included in the Sample or 2) The Mean is know with certainty.
37
L00D MGS 8110 - Descriptive Statistics 37 Normal Calculations NORMINV(0.95,68.8,2.6)=73.08
38
L00D MGS 8110 - Descriptive Statistics 38 Normal Calculations NORMDIST(67,68.8,2.6,TRUE)=.244
39
L00D MGS 8110 - Descriptive Statistics 39 Standardized Normal Calculations NORMSINV(0.95)=1.645 NORMSINV(.05)=-1.645 X variable has mean and StDev of and which are estimated by x bar and s. Z variable has mean=0 and StDev=1. Z is a “standardized normal”.
40
L00D MGS 8110 - Descriptive Statistics 40 Standardized Normal Calculations NORMSDIST(-1)=.159 NORMSDIST(+1)=.841
41
L00D MGS 8110 - Descriptive Statistics 41 t-distribution t-distribution is needed if is not know and estimated by s and n<30.
42
L00D MGS 8110 - Descriptive Statistics 42 t-distribution Calculations one-tail TDIST(2,4,1)=.058 TDIST(X, d.f., # tails) “t” with tails=1 sums from + infinity. “Z” and “Normal” sums from – infinity.
43
L00D MGS 8110 - Descriptive Statistics 43 t-distribution Calculations two-tail TDIST(2,4,2)=.116 TDIST(X, d.f., # tails) “t (2 tail)” sums simultaneously from both – infinity and + infinity. Undefined for negative values of t.
44
L00D MGS 8110 - Descriptive Statistics 44 Loading “Data Analysis” in Office 2003 / Tools / Add-Ins / Will need to have original MS Office CD. Both NO
45
L00D MGS 8110 - Descriptive Statistics 45 Example of Tools / Data Analysis / Descriptive Statistics
46
L00D MGS 8110 - Descriptive Statistics 46 Example Output of Tools / Data Analysis / Descriptive Statistics
47
L00D MGS 8110 - Descriptive Statistics 47 Loading “Data Analysis” in Office 2007 1)Click the Office button in the upper left hand corner of the Excel. 2)Click the “Excel Options” tab in the bottom right- hand cornet of the drop- down menu gotten from step #1. Both NO 1) click
48
L00D MGS 8110 - Descriptive Statistics 48 Loading “Data Analysis” in Office 2007 3)Click Add-Ins in the left banner of the Excel Options menu. 4)Click “Analysis ToolPak” in the Add-ins menu. Then, select BOTH “Analysis ToolPak” and “Analysisi ToolPak – VBA” 5)Click the Go button at the bottom right hand corner of the Excel Options menu. Don’t click the “OK” button”. Both NO Both 5) click 4) click3) click 6) click
49
L00D MGS 8110 - Descriptive Statistics 49 Loading “Data Analysis” in Office 2007 1) click
50
L00D MGS 8110 - Descriptive Statistics 50 Example of Tools / Data Analysis / Descriptive Statistics
51
L00D MGS 8110 - Descriptive Statistics 51 Precision of numerical results – state “3 Significant Digits”
52
L00D MGS 8110 - Descriptive Statistics 52 Precision of numerical results – state “3 Significant Digits” (continued)
53
L00D MGS 8110 - Descriptive Statistics 53 Data is a potential outlier Symmetric distribution x i mean + 3 s Skewed distribution x i Q 3 + (1.5)R Q Some Great Rules of Thumb Data is Normally distributed (Bell shaped) if -1 skewness +1 and if -1 kurtosis +1 Let’s Review
54
L00D MGS 8110 - Descriptive Statistics 54 Prerequisite Spreadsheet Skills Cut, Copy, Paste & Paste Special Cell corner Copy Add or delete Rows or Columns Change width/height of row/column Font, alignment, boarder & number of cell Referencing and calculations with cells. Data / Sort Naming cell or range of cells Insert / Function / Average Sum, Max, Min, Count, Small and Large (Tools / Add-ins / Data analysis) Tools / Data Analysis / Descriptive statistics Single quote for equation statement. REPLACE command. DATA / Group & FORMAT: Column, Hide. Grab an entire column of data (CTRL+SHIFT, down arrow). See also “L99A MBA7025.ppt” in folder “L00A MGS8110”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.