Skewness & Kurtosis: Reference

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Measures of Dispersion
Introduction to Summary Statistics
Review of Previous Lecture Range –The difference between the largest and smallest values Interquartile range –The difference between the 25th and 75th.
Introduction to Summary Statistics
Descriptive Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Analysis of Research Data
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Data observation and Descriptive Statistics
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Numerical Descriptive Techniques
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Methods for Describing Sets of Data
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Review Measures of central tendency
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Measures of Dispersion
Measures of Dispersion How far the data is spread out.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Practice Page 65 –2.1 Positive Skew Note Slides online.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
Exploratory Data Analysis
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Review of Previous Lecture
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Numerical Descriptive Measures
Presentation transcript:

Skewness & Kurtosis: Reference Source: http://mathworld.wolfram.com/NormalDistribution.html

Further Moments – Skewness Skewness measures the degree of asymmetry exhibited by the data If skewness equals zero, the histogram is symmetric about the mean Positive skewness vs negative skewness Skewness measured in this way is sometimes referred to as “Fisher’s skewness”

Further Moments – Skewness Source: http://library.thinkquest.org/10030/3smodsas.htm

Mode Median Mean A B

Median Mean n = 26 mean = 4.23 median = 3.5 mode = 8

Value Occurrences Deviation Cubed deviation Occur*Cubed 1 1 (1 – 4.23) = -3.23 (-3.23)3 = -33.70 -33.70 2 4 (2 – 4.23) = -2.23 (-2.23)3 = -11.09 -44.36 3 8 (3 – 4.23) = -1.23 (-1.13)3 = -1.86 -14.89 4 4 (4 – 4.23) = -0.23 (-0.23)3 = -0.01 -0.05 5 3 (5 – 4.23) = 0.77 (+0.77)3 = 0.46 1.37 6 2 (6 – 4.23) = 1.77 (+1.77)3 = 5.54 11.09 7 1 (7 – 4.23) = 2.77 (+2.77)3 = 21.25 21.25 8 1 (8 – 4.23) = 3.77 (+3.77)3 = 53.58 53.58 9 1 (9 – 4.23) = 4.77 (+4.77)3 = 108.53 108.53 10 1 (10 - 4.23)= 5.77 (+5.77)3 = 192.10 192.10 Sum = 294.94 Mean = 4.23 s = 2.27 Skewness = 0.97

Skewness > 0 (Positively skewed) Mode Median Mean Skewness > 0 (Positively skewed)

Skewness < 0 (Negatively skewed) Mode Median Mean A B Skewness < 0 (Negatively skewed)

Skewness = 0 (symmetric distribution) Source: http://mathworld.wolfram.com/NormalDistribution.html Skewness = 0 (symmetric distribution)

Skewness – Review Positive skewness Negative skewness There are more observations below the mean than above it When the mean is greater than the median Negative skewness There are a small number of low observations and a large number of high ones When the median is greater than the mean

Kurtosis – Review Kurtosis measures how peaked the histogram is (Karl Pearson, 1905) The kurtosis of a normal distribution is 0 Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution

Kurtosis – Review Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide) Thus, negative kurtosis indicates a relatively flat distribution Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked) Thus, positive kurtosis indicates a relatively peaked distribution

Source: http://espse. ed. psu. edu/Statistics/Chapters/Chapter3/Chap3

Measures of central tendency – Review Measures of the location of the middle or the center of a distribution Mean Median Mode

Mean – Review Mean – Average value of a distribution; Most commonly used measure of central tendency Median – This is the value of a variable such that half of the observations are above and half are below this value, i.e., this value divides the distribution into two groups of equal size Mode - This is the most frequently occurring value in the distribution

An Example Data Set Daily low temperatures recorded in Chapel Hill (01/18-01/31, 2005, °F) Jan. 18 – 11 Jan. 25 – 25 Jan. 19 – 11 Jan. 26 – 33 Jan. 20 – 25 Jan. 27 – 22 Jan. 21 – 29 Jan. 28 – 18 Jan. 22 – 27 Jan. 29 – 19 Jan. 23 – 14 Jan. 30 – 30 Jan. 24 – 11 Jan. 31 – 27 For these 14 values, we will calculate all three measures of central tendency - the mean, median, and mode

Mean – Review Mean –Most commonly used measure of central tendency Procedures (1) Sum all the values in the data set (2) Divide the sum by the number of values in the data set Watch for outliers

Mean – Review (1) Sum all the values in the data set  11 + 11 + 11 + 14 + 18 + 19 + 22 + 25 + 25 + 27 + 27 + 29 + 30 + 33 = 302 (2) Divide the sum by the number of values in the data set  Mean = 302/14 = 21.57 Is this a good measure of central tendency for this data set?

Median – Review Median - 1/2 of the values are above it & 1/2 below (1) Sort the data in ascending order (2) Find the value with an equal number of values above and below it (3) Odd number of observations  [(n-1)/2]+1 value from the lowest (4) Even number of observations  average (n/2) and [(n/2)+1] values (5) Use the median with asymmetric distributions, particularly with outliers

Median – Review (1) Sort the data in ascending order:  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Find the value with an equal number of values above and below it Even number of observations  average the (n/2) and [(n/2)+1] values  (14/2) = 7; [(14/2)+1] = 8  (22+25)/2 = 23.5 (°F) Is this a good measure of central tendency for this data?

Mode – Review Mode – This is the most frequently occurring value in the distribution (1) Sort the data in ascending order (2) Count the instances of each value (3) Find the value that has the most occurrences If more than one value occurs an equal number of times and these exceed all other counts, we have multiple modes Use the mode for multi-modal data

Mode – Review (1) Sort the data in ascending order:  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Count the instances of each value: 3x 1x 1x 1x 1x 2x 2x 1x 1x 1x (3) Find the value that has the most occurrences  mode = 11 (°F) Is this a good measure of the central tendency of this data set?

Measures of Dispersion – Review In addition to measures of central tendency, we can also summarize data by characterizing its variability Measures of dispersion are concerned with the distribution of values around the mean in data: Range Interquartile range Variance Standard deviation z-scores Coefficient of Variation (CV)

An Example Data Set Daily low temperatures recorded in Chapel Hill (01/18-01/31, 2005, °F) Jan. 18 – 11 Jan. 25 – 25 Jan. 19 – 11 Jan. 26 – 33 Jan. 20 – 25 Jan. 27 – 22 Jan. 21 – 29 Jan. 28 – 18 Jan. 22 – 27 Jan. 29 – 19 Jan. 23 – 14 Jan. 30 – 30 Jan. 24 – 11 Jan. 31 – 27 For these 14 values, we will calculate all measures of dispersion

Range – Review Range – The difference between the largest and the smallest values (1) Sort the data in ascending order (2) Find the largest value  max (3) Find the smallest value  min (4) Calculate the range  range = max - min Vulnerable to the influence of outliers

Range – Review Range – The difference between the largest and the smallest values (1) Sort the data in ascending order  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Find the largest value  max = 33 (3) Find the smallest value  min = 11 (4) Calculate the range  range = 33 – 11 = 22

Interquartile Range – Review Interquartile range – The difference between the 25th and 75th percentiles (1) Sort the data in ascending order (2) Find the 25th percentile – (n+1)/4 observation (3) Find the 75th percentile – 3(n+1)/4 observation (4) Interquartile range is the difference between these two percentiles

Interquartile Range – Review (1) Sort the data in ascending order  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 (2) Find the 25th percentile – (n+1)/4 observation  (14+1)/4 = 3.75  11+(14-11)*0.75 = 13.265 (3) Find the 75th percentile – 3(n+1)/4 observation  3(14+1)/4 = 11.25  27+(29-27)*0.25 = 27.5 (4) Interquartile range is the difference between these two percentiles  27.5 – 13.265 = 14.235

Variance – Review Variance is formulated as the sum of squares of statistical distances (or deviation) divided by the population size or the sample size minus one:

Variance – Review (1) Calculate the mean  (2) Calculate the deviation for each value (3) Square each of the deviations (4) Sum the squared deviations (5) Divide the sum of squares by (n-1) for a sample

Variance – Review (1) Calculate the mean  (2) Calculate the deviation for each value Jan. 18 (11 – 25.7) = -10.57 Jan. 25 (25 – 25.7) = 3.43 Jan. 19 (11 – 25.7) = -10.57 Jan. 26 (33 – 25.7) = 11.43 Jan. 20 (25 – 25.7) = 3.43 Jan. 27 (22 – 25.7) = 0.43 Jan. 21 (29 – 25.7) = 7.43 Jan. 28 (18 – 25.7) = -3.57 Jan. 22 (27 – 25.7) = 5.43 Jan. 29 (19 – 25.7) = -2.57 Jan. 23 (14 – 25.7) = -7.57 Jan. 30 (30 – 25.7) = 8.42 Jan. 24 (11 – 25.7) = -10.57 Jan. 31 (27 – 25.7) = 5.42

Variance – Review (3) Square each of the deviations  Jan. 18 (-10.57)^2 = 111.76 Jan. 25 (3.43)^2 = 11.76 Jan. 19 (-10.57)^2 = 111.76 Jan. 26 (11.43)^2 = 130.61 Jan. 20 (3.43)^2 = 11.76 Jan. 27 (0.43)^2 = 0.18 Jan. 21 (7.43)^2 = 55.18 Jan. 28 (-3.57)^2 = 12.76 Jan. 22 (5.43)^2 = 29.57 Jan. 29 (-2.57)^2 = 6.61 Jan. 23 (7.57)^2 = 57.33 Jan. 30 (8.43)^2 = 71.04 Jan. 24 (-10.57)^2 = 111.76 Jan. 31 (5.43)^2 = 29.57 (4) Sum the squared deviations = 751.43

Variance – Review (5) Divide the sum of squares by (n-1) for a sample  = 751.43 / (14-1) = 57.8 The variance of the Tmin data set (Chapel Hill) is 57.8

Standard Deviation – Review Standard deviation is equal to the square root of the variance Compared with variance, standard deviation has a scale closer to that used for the mean and the original data

Standard Deviation – Review (1) Calculate the mean  (2) Calculate the deviation for each value (3) Square each of the deviations (4) Sum the squared deviations (5) Divide the sum of squares by (n-1) for a sample (6) Take the square root of the resulting variance

Standard Deviation – Review (1) – (5)  s2 = 57.8 (6) Take the square root of the variance  The standard deviation (s) of the Tmin data set (Chapel Hill) is 7.6 (°F)

z-score – Review Since data come from distributions with different means and difference degrees of variability, it is common to standardize observations One way to do this is to transform each observation into a z-score May be interpreted as the number of standard deviations an observation is away from the mean

z-scores – Review z-score is the number of standard deviations an observation is away from the mean (1) Calculate the mean  (2) Calculate the deviation (3) Calculate the standard deviation (4) Divide the deviation by standard deviation

z-scores – Review Z-score for maximum Tmin value (33 °F) (1) Calculate the mean  (2) Calculate the deviation (3) Calculate the standard deviation (SD) (4) Divide the deviation by standard deviation

Coefficient of Variation – Review Coefficient of variation (CV) measures the spread of a set of data as a proportion of its mean. It is the ratio of the sample standard deviation to the sample mean It is sometimes expressed as a percentage There is an equivalent definition for the coefficient of variation of a population

Coefficient of Variation – Review (1) Calculate mean  (2) Calculate standard deviation (3) Divide standard deviation by mean CV =

Coefficient of Variation – Review (1) Calculate mean  (2) Calculate standard deviation (3) Divide standard deviation by mean CV =

Histograms – Review We may also summarize our data by constructing histograms, which are vertical bar graphs A histogram is used to graphically summarize the distribution of a data set A histogram divides the range of values in a data set into intervals Over each interval is placed a bar whose height represents the percentage of data values in the interval.

Building a Histogram – Review (1) Develop an ungrouped frequency table  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33  11 3 14 1 18 19 22 25 2 27 29 30 33

Building a Histogram – Review 2. Construct a grouped frequency table  Select a set of classes  11-15 4 16-20 2 21-25 3 26-30 31-35 1

Building a Histogram – Review 3. Plot the frequencies of each class

Box Plots – Review We can also use a box plot to graphically summarize a data set A box plot represents a graphical summary of what is sometimes called a “five-number summary” of the distribution Minimum Maximum 25th percentile 75th percentile Median Interquartile Range (IQR) Rogerson, p. 8. 75th %-ile max. median 25th %-ile min.

Boxplot – Review

Further Moments of the Distribution While measures of dispersion are useful for helping us describe the width of the distribution, they tell us nothing about the shape of the distribution Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 91.

Skewness – Review Skewness measures the degree of asymmetry exhibited by the data Positive skewness – More observations below the mean than above it Negative skewness – A small number of low observations and a large number of high ones For the example data set: Skewness = -0.1851

Skewness = -0.1851 (Negatively skewed)

Kurtosis – Review Kurtosis measures how peaked the histogram is Leptokurtic: a high degree of peakedness Values of kurtosis over 0 Platykurtic: flat histograms Values of kurtosis less than 0 For the example data set: Kurtosis = -1.54 < 0

Kurtosis = -1.54 < 0 (Platykurtic)