Frequency Distributions To accompany Hawkes Lesson 2.1 Original content by D.R.S. 1
Is your data Qualitative or Quantitative? Qualitative: it’s a category – Blood type – Model of car – Favorite fast food restaurant Quantitative: it’s a numerical measurement – Heart rate, beats per minute – Fuel efficiency, miles per gallon – Dollars spent on meal – My pain, on a scale from 1 to 10 2
Frequency Distribution for Categorical Data CategoryFrequencyRelative Frequency (list the categories here in this column) (put the counts of how many in this column) (this category is what percent of the total sample size?) (What order? Highest frequency down to lowest? Lowest to highest? Alphabetical? It’s your design decision.) 3
Categorical Frequency Distributions are the fuel for the “Family Feud” 4 (Photograph borrowed from some web site somewhere; I failed to record the exact source.)
Categorical (or, Qualitative) Frequency Distribution example “What state did you visit most recently?” State visited (the category)How many (the frequency) Alabama 71 California 18 Florida138 New York 7 South Carolina 48 Tennessee 27 Texas 53 Other states 70 TOTAL432 5
Things we do with Categorical Frequency Distributions Sometimes we just leave them as tables of words and numbers for reference and interpretation. We draw pictures of them (future lessons). – Bar graphs – Pie charts – Cutesy repeated icons variation of the bar graph 6
A famous categorical frequency distribution we will revisit later Draw this 5-card poker handFrequency Royal Flush 4 Straight Flush (not including Royal Flush) 36 Four of a Kind 624 Full House 3,744 Flush (not including Royal Flush or Straight Flush) 5,108 Straight (not including Royal Flush or Straight Flush) 10,200 Three of a Kind 54,912 Two Pair 123,552 One Pair 1,098,240 Something that’s not special at all 1,302,540 Total 2,598,600
Quantitative Frequency Distribution (data is number measurements) ClassesFrequency Each class is a low-to- high range of values These are called the “Class Limits” The frequency column gives a count of how many data values fit in the class 8
Quantitative Frequency Distribution (data are number measurements) Placement Test Score How many applicants and above28 9
About the Quantitative Frequency Distribution Instead of individual test score values, we GROUPED data into CLASSES Other names for “classes”: “bins”, “buckets” Each class is a low-to-high range of data values Each data value falls into exactly one class May be one or two “open-ended”classes – Like our “50 and higher” 10
About the classes CLASS LIMITS are 10-19, 20-29, etc. Classes do not overlap! Classes are usually the same width. CLASS MIDPOINTS are like 14.5, 24.5, etc. (High minus low, divided by 2) 11
Class LIMITS vs. Class BOUNDARIES CLASS LIMITS are 10-19, 20-29, etc. CLASS BOUNDARIES split the “gap” between class limits: , , etc. “ ” means 9.5 ≤ x < 19.5 (note ≤ vs. < ) – All values between 9.5 and 19.5 – Including the lower endpoint of 9.5 – But excluding the upper endpoint of
A Cumulative Frequency column Placement Test Score How many applicants Cumulative frequency and above = 57 + 52 = 109 + 71 = 180 + 50 = 230 + 28 =
A Relative Frequency column Placement Test Score How many applicants Relative frequency % % % % % 50 and above % TOTAL258 14
Constructing a Frequency Distribution 1.How many classes should we have? 2.What class width should we use? 3.Find the class limits. 4. Sort your data, find the frequency of each class. Adapted from textbook page 46 © HLS 15
Example of Construction Using runners’ times from the Bunny Hop 5K in Cordele, March 31, 2012 – original data downloaded from a link at rungeorgia.com Click link to pdflink to pdf 16
1. How many classes? Between 5 classes and 20 classes is good How many data values do you have? One textbook suggests: if you have < 125 data values, use the square root of the number of data values The Bunny Hop race had 103 finishers. By that rule, we would have 10 or 11 classes. Let’s agree on 10 classes for this example. 17
2. Choose a Class Width The “range” is the highest data value minus the lowest data value. Divide the range by the number of classes Then bump up to the next integer. That’s just a starting point 18
2. Choose a Class Width High – Low = Range Divide the range by the number of classes ÷ 10 = Then bump up to the next integer. Class width is 5 That’s just a starting point We like it; it sounds good. Nice “round” kind of a number for our readers 19
3. Find the Class Limits Start at what value for the first class? The lowest value is Let’s start our first class at Same number of decimal places as the data The first class has a lower class limit of The lower limit of the next class is Take the lower limit of from previous class + class width of 5 = lower limit for next class 20
3. Find the Class Limits - Lower The first class has lower class limit = The next class has lower class limit = Etc. for the rest of the 10 classes: , , , minutes, and , , , minutes 21
3. Find the class limits - Upper The first class has lower class limit The second class has lower class limit – So the first class has upper class limit The first class’s class limits: – Then next comes – Then – , etc. All the way up through
4. Count the frequency of each class Time (minutes)Frequency If tallying unsorted data by hand, hash marks are useful. 23
Class Limits and Class Boundaries Class LimitsClass Boundaries – – – Etc – – –
Class Limits and Class Boundaries What to do with the gap between the class limits of adjacent classes? Limits and There’s gap between and Midway between them is Class Boundaries extend to that midpoint – and –
Class Boundaries Example: Class Limits – Class Boundaries – This means ≤ x < Note: including the lower boundary (≤) But not including the upper boundary (<) Because classes must never overlap 26
Class Midpoints (Upper Limit + Lower Limit) ÷ 2 Class LimitsClass Midpoints Etc = one class width apart
Class Limits, Boundaries, and Midpoints for the Placement Test It’s easier with whole numbers as class limits 28 Class Limits Frequency Class Boundaries Class Midpoint – – – – – and upNone? Or 54.5?
Excel Tools Link: The Excel FREQUENCY function.The Excel FREQUENCY function Link: The Excel COUNTIF function.The Excel COUNTIF function – Need to add info about COUNTIFS function. Also Excel “Histogram” function generates frequency distributions (discussed in the Histogram lesson) 29