BUS304 – Data Charaterization1 Other Numerical Measures Median Mode Range Percentiles Quartiles, Interquartile range
BUS304 – Data Charaterization2 Median The middle value Steps:1. Put your data in ordered array (sort) 2. If n (or N) is odd, the median is the middle number (i.e. the th number) 3. If n (or N) is even, the median is the average of two middle numbers (i.e. the average of the and the +1 th numbers) Steps:1. Put your data in ordered array (sort) 2. If n (or N) is odd, the median is the middle number (i.e. the th number) 3. If n (or N) is even, the median is the average of two middle numbers (i.e. the average of the and the +1 th numbers) -- The value which divides the data in half, with equalsizes above and below
BUS304 – Data Charaterization3 Sensitivity to outliers Median = Median = Median = 3 Median does not affected by extreme values
BUS304 – Data Charaterization4 Mode The value that occurs most often Steps:1. Put your data in ordered array (sort) 2. Find the data value(s) that repeats the most frequently Steps:1. Put your data in ordered array (sort) 2. Find the data value(s) that repeats the most frequently BostonAustinSan Diego Los Angels Mode=5 Mode=5 and 9 No Mode! Mode=San Diego Mode does not affectedby extreme value either.
BUS304 – Data Charaterization5 Find Mode and Median from Frequency Table Below is a frequency table showing the number of days the teams finish their projects Find the mean, median and mode. Create a histogram, locate the mode, median and mode. Describe the shape of the histogram, and find the relationship between mean, median and mode. Days to Complete Frequency Relative Frequency 54? 612? 78? 86? 94? 102?
BUS304 – Data Charaterization6 Shape of a distribution Mean < Median < Mode Left-Skewed (Longer tail extends to left) Mean = Median = Mode Symmetric Mode < Median < Mean Right-Skewed (Longer tail extends to right) Note that Mean is affected by the extreme value the most. So mean is always leaning towards the tail compared to the other two measures.
BUS304 – Data Charaterization7 Measures of center location Mean Median Mode Mean is generally used, unless extreme values (outliers) exist; the next common is median, since the median is not sensitive to extreme values; mode is sometime used when there is a really large frequency. Think of the example of house price
BUS304 – Data Charaterization8 Range Simplest measure of variation Describe how wide the data spread Formula Range = Maximum Value – Minimum Value Range = = 13 Example:
BUS304 – Data Charaterization9 Disadvantage of Range Ignores the way in which data are distributed Sensitive to outliers Range = = Range = = 5 1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = = 4 Range = = 119 Range is affected the most by outliers.
BUS304 – Data Charaterization10 Break
BUS304 – Data Charaterization11 Other measures 1.Percentiles: Measures the percentage of data below the value. e.g. if the 60th percentile is 1240 (SAT score), that means there are 60% students getting a score less than Correspondingly, there are 40% of students getting 1240 or higher. How to find percentile? The p th percentile in an ordered array of n values is the value in the i th position, where
BUS304 – Data Charaterization12 Example Find the 80th percentile from the annual income data Step: 1.Sort the data 2.Find the location for the 80th percentile: 3.Find the 81st person’s income Think, what does this income mean? Exercise: find the value where 30% people have the income or higher. Exercise2: find the value where 30% people have the income less than it. Exercise 3: find the value where 50% people have the income less than it. What is the measure also called?
BUS304 – Data Charaterization13 Quartiles The 25 th, 50 th, and 75 th percentiles Called the first, second, and third quartiles, respectively. Written as Q1, Q2, Q3, respectively. The quartiles split the ranked data into 4 equal groups. 25% Q1Q1Q2Q2Q3Q3
BUS304 – Data Charaterization14 Example: Example: Find the first quartile in the data sample: Median = the 50th percentile = the second quartile
BUS304 – Data Charaterization15 Interquartile Range Recall: Range? Disadvantage of range? Interquartile Range: Interquartile Range = Q3 – Q Example: Q1=13.5Q3=19 Interquartile range = Q3 – Q1 = 19 – 13.5 = 5.5
BUS304 – Data Charaterization16 Summary Understand and compute the following two sets of data measures: Measures of central tendency Mean, Median, and Mode Measures of variation Range, Variance, and Standard deviation Other ways to describe data: Percentiles, Quartiles, Interquartile range