HISTOGRAMS Representing Data
Why use a Histogram When there is a lot of data When data is Continuous a mass, height, volume, time etc Presented in a Grouped Frequency Distribution Often in groups or classes that are UNEQUAL
Continuous data NO GAPS between Bars Histograms look like this......
Bars may be different in width Determined by Grouped Frequency Distribution
AREA is proportional to FREQUENCY NOT height, because of UNEQUAL classes! So we use FREQUENCY DENSITY = Frequency Class width
Grouped Frequency Distribution Speed, km/h 0< v ≤4040< v ≤5050< v ≤6060< v ≤9090< v ≤110 Frequency Classes These classes are well defined there are no gaps !
Drawing Sensible Scales Bases of rectangles correctly aligned Plot the Class Boundaries carefully Heights of rectangles needs to be correct Frequency Density
Speed, kph0< v ≤4040< v ≤5050< v ≤6060< v ≤9090< v ≤110 Frequency Frequency Density Class width Frequency Densities
Freq Dens Speed (km/h) Frequency = Width x Height Frequency = 40 x 2.0 = 80
Grouped Frequency Distribution Time taken (nearest minute) Freq Speed, kph0< v ≤4040< v ≤5050< v ≤6060< v ≤9090< v ≤110 Frequency Classes No gaps GAPS!Need to adjust to Continuous Ready to graph
Adjusting Classes Class Widths Time taken (nearest minute) Freq ½4½19½29½39½59½
Frequency Density Time taken (nearest minute) Freq Class width Frequency Density
Drawing Sensible Scales Bases correctly aligned Plot the Class Boundaries Heights correct Frequency Density
Freq Dens Time (Mins)
Estimating a Frequency Imagine we want to Estimate the number of people with a time between 12 and 25 mins Because we have rounded to nearest minute with our classes we Consider the interval from 11.5 to 25.5
Freq Dens Time (Mins) Frequency = 0.9 x 8 = 7.2 Frequency = 1.8 x 6 = 10.8 Total Frequency = 18 FD Width
We can estimate the Mode Time taken (nearest minute) Freq CF Mode is therefore in this Class
Freq Dens Time (Mins) Modal class
…and the other one? Simpler to plot No adjustments required – class widths friendly No ½ values Estimation from the EXACT values given No adjustment required Estimate 15 to 56 would use 15 and 56! Appear LESS OFTEN in the exam Speed, kph0< v ≤4040< v ≤5050< v ≤6060< v ≤9090< v ≤110 Frequency
Why use frequency density for the vertical axes of a Histogram? The effect of unequal class sizes on the histogram can lead to misleading ideas about the data distribution The vertical axis is Frequency Density
Example: Misprediction of Grade Point Average (GPA) The following table displays the differences between predicted GPA and actual GPA. Positive differences result when predicted GPA > actual GPA. Class IntervalFrequencyClass width -2.0 to < to < to < to < to < to < to < to < The frequency histogram considerably exaggerates the incidence of overpredicted and underpredicted values The area of the two most extreme rectangles are much too large.!! X % of data 17.1% of data
Example: Density Histogram of Misreporting GPA Class IntervalFrequencyClass widthFrequency Density -2.0 to < to < to < to < to < to < to < to < Frequency=( rectangle height )x( class width ) = area of rectangle To avoid the misleading histogram like the one on last slide, display the data with frequency density
X Frequency density x 10 -3
Chap 2-24 Principles of Excellent Graphs The graph should not distort the data. The graph should not contain unnecessary things (sometimes referred to as chart junk). The scale on the vertical axis should begin at zero. All axes should be properly labelled. The graph should contain a title. The simplest possible graph should be used for a given set of data.
Chap 2-25 Graphical Errors: Chart Junk 1960: $ : $ : $ : $3.80 Minimum Wage Bad Presentation Minimum Wage $ Good Presentation
Chap 2-26 Graphical Errors: No Relative Basis A’s received by students. Bad Presentation FDUGGRSR Freq. 10% 30% FDUGGRSR FD = Foundation, UG = UG Dip, GR = Grad Dip, SR = Senior % 0% % Good Presentation
Chap 2-27 Graphical Errors: Compressing the Vertical Axis Good Presentation Quarterly Sales Bad Presentation Q1Q2Q3 Q4 $ Q1Q2 Q3 Q4 $
Chap 2-28 Graphical Errors: No Zero Point on the Vertical Axis Monthly Sales JFMAMJ $ Graphing the first six months of sales Monthly Sales J F MAMJ $ 36 Good Presentations Bad Presentation