Download presentation
1
Frequency Distributions
Lecture 2: Frequency Distributions
3
Exploratory Data Analysis
Emphasizes graphic representation Addresses the following questions: 1. What is going on here? 2. Are there patterns in these data? 3. What are the patterns in these data? Allows for hypothesis generation and model building through iteration EDA is an attitude toward data - Tukey
5
Distributions Distributions: arrangement of data; how the cases fall (are distributed)! Information about distributions is paramount in statistics. Distributions can be displayed in tabular and graphic form Frequency distributions: tabulation of the number of events/occurrences for each category on the scale of measurement Frequency: a count of the number of occurrences
6
Frequency Table Pet type frequency proportion % dog cat fish turtle 30
25 20 2 77 0.39 0.32 0.26 0.03 1.0 39 32 26 3 100
7
Grouping Data Arrange data from high to low
Frequencies can be used to find the total number of scores: f = N X from a frequency distribution = fX Proportion = p = f/N Percentage = p (100) = f/N (100) Score X frequency p % 15 10 4 20 11 46 0.43 0.33 0.24 1.0 43 33 24 100
8
X f p % 10 2 9 5 8 4 7 3 6 Finish filling the chart. Then find N andX
9
X f ΣX Proportion % 10 2 20 .11 11 9 5 45 .28 28 8 4 32 .22 22 7 3 21 .17 17 6 12 N = 18 X = 140 1.0 100
10
Frequency Distributions
Ungrouped frequency distributions = a list of observations and their frequency occurrence when observations are sorted by single values. Grouped frequency distributions = a list of observations and their frequency of occurrence when observations are sorted into categories or intervals
11
Grouped vs. ungrouped frequency distributions: how to decide
How many values? If less than 10 different scores “ungrouped” is good If more than 10 scores use grouped Reminder: how to figure how many values hi - lo + 1
12
Grouped Frequency Distributions
Crucial guidelines for constructing frequency distributions: (1) aim for groups (2) Mutually exclusive (each observation should be represented only once) (3) All classes should have equal intervals (even if the frequency for a particular class is zero) Wrong Right (4) Don’t omit any intervals (5) Make widths convenient (e.g. not 2.2) & bottom score should be a multiple of the width Formula for finding the width in a grouped frequency distribution: interval = (hi - lo +1)/# groups
13
Example *Groups are called class intervals
i = ( )/ 10 = 3.2 ~ 3 midpoint = (hi real + lo real) /2 START at the Bottom with the low number Interval Real Limits f Midpoint *Groups are called class intervals * Class intervals are apparent limits b/c it appears that they form the upper and lower boundaries for the class interval, but must take into account the real limits NOTE: I violated the rule of making the bottom score a multiple of the width
14
Example * Include columns for interval, real limits, frequency,
* Include columns for interval, real limits, frequency, and midpoint
15
Interval Real limits f Midpoint 92-95 4 93.5 88-91 5 89.5 84-87 6 85.5 80-83 1 81.5 76-79 9 77.5 72-75 73.5 68-71 8 69.5 64-67 7 65.5 60-63 61.5 56-59 57.5 52-55 53.5
16
Percentiles and Percentile Ranks: Get more out of your frequency distribution
Scores alone are meaningless. Compare a score to a standard score with percentiles. Percentile: #s that divide the distribution into 100 = parts Percentile rank: # that represents the % of cases in a comparison group that achieved scores the one cited e.g. PR of 95 on the SAT means that 95% of those taking the SAT performed equally or worse than you and only 5% did better
17
Example Class grades f c f cprop cum % 32 First step: find the number of people located at or below each point in the distribution - Note that the cumulative percentage is associated with the upper real limit of its interval. What’s the 81st percentile (careful remember the X values are not points on a scale, but rather intervals)? What is the percentile rank for 70.5?
18
Add following info to your in group table
cumf cum%
19
Interval Real limits f cf c% Midpoint 92-95 91.5-95.5 4 64 100 93.5
88-91 5 60 94 89.5 84-87 6 55 86 85.5 80-83 1 49 77 81.5 76-79 9 48 75 77.5 72-75 39 61 73.5 68-71 8 34 53 69.5 64-67 7 26 41 65.5 60-63 19 30 61.5 56-59 11 17 57.5 52-55 53.5 How about the percentile rank for X = 59.5? What’s the percentile rank for X = 91.5? What’s the 86th percentile? How about the 61st percentile?
20
Obtaining PR or Interpolation When values don’t appear in the table
Class grades f cum f cum prop c % Some important symbols: *Cumfll = cf at lower real limit of X *c% = cf/ N (100%) *X = score *Xll = score at lower real limit of X *i = interval width *fi = # of cases in X’s group *N = total # scores
21
Obtaining PR or Interpolation
Class grades f cum f cum prop cum % What is the PR of 88? Getting PR from score (X). PR = cumfll + (( X - Xll) / i) (fi) N X = 88 i = ( ) = 10 cumfll = 22 Xll = 80.5 N = fi = 4 x 100 78.13 %
22
Obtaining PR or Interpolation
Class grades f cum f cum prop cum % What is the PR of 88? Getting PR from score (X). using interpolation 81-69 = 12 2.5/10 = a/12 a = 3 81-3 = 78% = 10 88 is 2.5 units down the interval 90 81 88 X 81 69
23
Obtaining the score (X) from PR
Class grades f cum f cum prop cum % What is the score that corresponds to a PR of 72? cumf = (PR x N) / 100 = (72 x 32)/ 100 = 23.04 X = Xll + [ i (cumf - cumll )/ fi ] = [ 10 ( )/ 4] = 83.1
24
Obtaining the score (X) from PR
Class grades f cum f cum prop cum % What is the score that corresponds to a PR of 72? = 10 9/12 = a/10 a = 7.5 = 83 81-69 = 12 72 is 9 down 90 81 X 72 81 69
25
Interval Real limits f cf c% Midpoint 92-95 91.5-95.5 4 64 100 93.5
88-91 5 60 94 89.5 84-87 6 55 86 85.5 80-83 1 49 77 81.5 76-79 9 48 75 77.5 72-75 39 61 73.5 68-71 8 34 53 69.5 64-67 7 26 41 65.5 60-63 19 30 61.5 56-59 11 17 57.5 52-55 53.5 How about the percentile rank for X = 73? What score is at the 88th percentile?
26
Graphs Visual methods to display data: Basics:
Figure: pictorial; photo; drawing Table: organized numerical info Graph: pictorial; axes, #s, etc. Basics: X-axis or abscissa: horizontal line in the graph; IV Y-axis or ordinate: vertical line in the graph: DV Always label axes [graph’s height should be roughly 2/3 to 3/4 the length (see Box 2.1 pg. 49)] Y starts at 0; continuous, no breaks X can change start; break; can be discrete
27
Bar Graphs Bar graph = nominal or ordinal data (usually nonnumerical values) Each bar = category Height = frequency Bars do NOT touch If uses ordinal data order must be preserved Can be vertical or horizontal
28
Histogram Histogram = interval and ratio data
Same rules as bar graph EXCEPT bars touch Usually for discrete data Width of the bar extends to the real limits of the score
29
Frequency Distribution Polygon or Line Graph
Line graph = interval, ratio data Usually used for continuous data A dot is centered above each score Can also use this type of graph for relative frequencies (proportions) when there is a large amount of data. In this case each dot would be placed at the midpoint of the range 56 57 58 59 60
30
Cumulative Frequency Graph
Cumulative Frequency Graph = can be a bar, histogram, or line. Uses the proportion or percentage Line graph version is typically s-shaped or ogive Always increases
31
Stem and Leaf Displays Data Set 54 81 82 61 97 83 74 67 86 80 68 87
75 81
32
Advantages to Stem and Leaf
Easy to construct Allows you to identify each and every individual score (frequency distribution just tells you the frequency) Both a picture and listing of scores (if you turn the stem and leaf display on its side it looks like a histogram) * Caveat - just seen as a preliminary means for organizing data
33
Distributions 3 characteristics that describe a distribution
Shape Central tendency: center of distribution Variability: spread of scores Shape: technically shape is defined by an equation that prescribes the exact relationship between each X and Y value on the graph
34
Characteristics of Distributions defined by shape
Skewness (Sk): measure of balance in a distribution Evenly balanced distributions have no Sk; they are normal or symmetrical Positive Sk (Sk+): tail trails to the right (positive dir.) Negative Sk (Sk-): tail trails to the left (negative dir.) Kurtosis (Ku): measure of how peaked a distribution is Platykurtic: relatively flat Leptokurtic: relatively peaked Mesokurtic: neither flat nor peaked
35
Homework - Chapter 2 1, 2, 4-6, 7, 9, 11, 13, 17, 20, 21, 23, 26 For problems 20, 21, and 23 use either the method of finding PR from X and X from PR that I taught today or the method of interpolation in the book. Your choice!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.