Numerical Measures: Skewness and Location PSYSTA1 – Week 6
Measure of Skewness statistical measure used to describe the distribution of the data relative to symmetry Goal: quantify the degree of asymmetry (e.g., location of tails, difference between “centers”, etc.) in a data set Sample Skewness: 𝐒𝐊 𝐱 = 𝟑[ 𝒙 −𝐦𝐞𝐝 𝒙 ] 𝒔
Some PROPERTIES In relation with histograms (i.e., locating the centers):
Example 1 Compute the coefficient of skewness for the data given below. Then, describe the skewness of the data based on computed coefficient. 2.5 3.2 3.8 1.3 1.4 0.0 0.0 2.6 5.2 4.8 0.0 4.6 2.8 3.3
Measures of Location statistical measures used to describe the (relative) standing or location of an observation relative to the rest of the data Goal: locate the observation relative to the rest of the observations Most Commonly used Measures: Percentiles (including deciles and quartiles) z-Scores
Percentiles defined as the value on the measurement scale below which a specified percentage of the scores in the distribution fall denoted by 𝑃 𝑘 , they divide the ranked data set into 100 equal parts A percentile 𝑷 𝒌 would indicate that at least k% of the data is less than or equal to the value of 𝑷 𝑘 (thus, 100%−𝑘% of the data is greater than 𝑃 𝑘 ).
Percentiles Calculating Percentiles: The (approximate) value of the 𝑘 𝑡ℎ percentile, denoted by 𝑷 𝒌 , is 𝑷 𝒌 ≈𝐯𝐚𝐥𝐮𝐞 𝐨𝐟 𝐭𝐡𝐞 𝒌𝒏 𝟏𝟎𝟎 𝒕𝒉 𝐭𝐞𝐫𝐦 𝐢𝐧 𝐚 𝐫𝐚𝐧𝐤𝐞𝐝 𝐬𝐞𝐭 where 𝑘 denotes the number of the percentile and n represents the sample size.
Percentiles Note: 𝒑= 𝒌𝒏 𝟏𝟎𝟎
Percentiles Some Special Percentiles: Deciles - divide the data set into ten equal parts Quartiles - divide the data set into four equal parts
Percentile Rank (𝑘) defined as the percentage of scores with values lower than the score in question Finding Percentile Rank of a Value: 𝒌= 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐯𝐚𝐥𝐮𝐞𝐬 𝐥𝐞𝐬𝐬 𝐭𝐡𝐚𝐧 𝒙 𝐭𝐨𝐭𝐚𝐥 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐯𝐚𝐥𝐮𝐞𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐬𝐞𝐭 ×𝟏𝟎𝟎
Some PROPERTIES 𝐦𝐞𝐝 𝐱 = 𝐏 𝟓𝟎 = 𝐃 𝟓 = 𝐐 𝟐 *[median] 𝑷 𝒌 in relation with probability (particularly cumulative %): 𝑷 𝑿≤ 𝑷 𝒌 ≈ 𝒌 𝟏𝟎𝟎 i.e., the cumulative percentage of observations with values less than or equal to 𝑷 𝒌 is approximately 𝒌% A more (statistically) robust measure of variability can be defined using quartiles, i.e., the inter- quartile range (IQR) defined as 𝐈𝐐𝐑= 𝑸 𝟑 − 𝑸 𝟏
Example 2 Consider the following data set which relates again to the student’s number of hours studied each day over a 2-week period. 2.5 3.2 3.8 1.3 1.4 0.0 0.0 2.6 5.2 4.8 0.0 4.6 2.8 3.3 Compute, and interpret whenever appropriate, for the following: a.) 𝑷 𝟑𝟑 e.) 𝑸 𝟐 b.) 𝑷 𝟖𝟓 f.) 𝑸 𝟑 c.) 𝑫 𝟏 g.) 𝐈𝐐𝐑 d.) 𝑸 𝟏 h.) 𝐩𝐞𝐫𝐜𝐞𝐧𝐭𝐢𝐥𝐞 𝐫𝐚𝐧𝐤 𝐨𝐟 𝟑.𝟖
BoxPlot (Box-and-Whiskers Plot) a graphical representation of a summary of five important values: the minimum value, the first quartile, the median (or the second quartile), the third quartile, and the maximum value [i.e., the five-number summary]
BoxPlot (Box-and-Whiskers Plot) Steps in Constructing a Boxplot: Rank the data in increasing order and calculate the values of the median ( 𝑄 2 ), first quartile ( 𝑄 1 ), and third quartile ( 𝑄 3 ). Also find the interquartile range (IQR). Find the lower and upper inner fences. 𝐋𝐨𝐰𝐞𝐫 𝐈𝐧𝐧𝐞𝐫 𝐅𝐞𝐧𝐜𝐞= 𝐐 𝟏 −𝟏.𝟓×𝐈𝐐𝐑 𝐔𝐩𝐩𝐞𝐫 𝐈𝐧𝐧𝐞𝐫 𝐅𝐞𝐧𝐜𝐞= 𝐐 𝟑 +𝟏.𝟓×𝐈𝐐𝐑 Determine the smallest and the largest values in the given data set within the two inner fences.
BoxPlot (Box-and-Whiskers Plot) Draw a horizontal line and mark the levels on it such that all the values in the given data set are covered. Above or below the horizontal line, draw a box with its left side at the position of the first quartile and the right side at the position of the third quartile. Inside the box, draw a vertical line at the position of the median. By drawing two lines, join the points of the smallest and the largest values within the two inner fences of the box. These two lines are called whiskers.
BoxPlot (Box-and-Whiskers Plot) The observations that fall outside the two inner fences are called outliers. They are either mild or extreme outliers. To determine such, there is a need to find the lower and upper outer fences. 𝐋𝐨𝐰𝐞𝐫 𝐎𝐮𝐭𝐞𝐫 𝐅𝐞𝐧𝐜𝐞= 𝐐 𝟏 −𝟑.𝟎×𝐈𝐐𝐑 𝐔𝐩𝐩𝐞𝐫 𝐎𝐮𝐭𝐞𝐫 𝐅𝐞𝐧𝐜𝐞= 𝐐 𝟑 +𝟑.𝟎×𝐈𝐐𝐑 Values outside of the inner fences but inside of the outer fences (yellow card zone) are referred to as mild outliers. Values outside of both fences (red card zone) are referred to as extreme outliers.
BoxPlot (Box-and-Whiskers Plot)
Some PROPERTIES In relation with skewness (i.e., characterizing asymmetry):
Example 3 Construct a boxplot for the data given below. 2.5 3.2 3.8 1.3 1.4 0.0 0.0 2.6 5.2 4.8 0.0 4.6 2.8 3.3