1 Frequency Distributions & Graphing
Nomenclature Frequency: number of cases or subjects or occurrences represented with f i.e. f = 12 for a score of 25 12 occurrences of 25 in the sample 1
Nomenclature Percentage: number of cases or subjects or occurrences expressed per 100 represented with P or % So, if f = 12 for a score of 25 when n = 25, then... % = 12/25*100 = 48% 1
Caveat (Warning) Should report the f when presenting percentages i.e. 80% of the elementary students came from a family with an income < $25,000 different interpretation if n = 5 compared to n = 100 report in literature as f = 4 (80%) OR 80% (f = 4) OR 80% (n = 4) 1
Frequency Distribution of Test Scores 40 items on exam Most students >34 skewed (more scores at one end of the scale) Cumulative Percentage: how many subjects in and below a given score 1 234
Eyeball check of data: intro to graphing with SPSS Stem and Leaf Plot: quick viewing of data distribution Boxplot: visual representation of many of the descriptive statistics discussed last week Bar Chart: frequency of all cases Histogram: malleable bar chart Scatterplot: displays all cases based on two values of interest (X & Y) Note: compare to our previous discussion of distributions (normal, positively skewed, etc…) 1 2
Frequency Stem & Leaf 2.00 Extremes (=<25.0) Stem width: 1 Each leaf: 1 case Stem and Leaf (SPSS: Explore command) Fast look at shape of distribution shows f numerically & graphically stem is value, leaf is f
Stem and Leaf Plots Another way of doing a stemplot Babe Ruth’s home runs in each of 14 seasons with the NY Yankees 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34,
Stem and Leaf Plots Back-to-back stem plots allow you to visualize two data sets at the same time Babe Ruth vs. Roger Maris Maris Ruth 1
Boxplots Maximum Q3 Median Q1 Minimum Note: we can also do side- by-side boxplots for a visual comparison of data sets 1
X axis (abcissa) Individual scores/categories Y axis (ordinate) f Format of Bar Chart 1
Test score data as Bar Chart Note only scores with non-zero frequencies are included. 1
Bar chart in PASW Using the height file on the web 1 2 3
Bar chart in SPSS Gives… 1 2
Bar chart in PASW Note you can use the same command for pie charts and histograms (next) 1
Format of Histogram Can be manipulated X axis (abcissa) Groups of scores/categories Y axis (ordinate) f Now the X-axis is groups of scores, rather than individual scores – gives a better idea of the distribution underlying the data. 1
Test score data as Histogram 1
Test score data as revised Histogram With an altered number of groups, you might get a better idea of the distribution 1
Scatterplot Quick way to visualize the data & see trends, patterns, etc… This plot visually shows the relationship between undergrad GPA and GRE scores for applicants to our program
Scatterplot Here’s the relationship between undergrad GPA (admitgpa) and GPA in our program 1
Scatterplot Finally, here’s the relationship between GRE scores and GPA in our program 1
Scatterplot in PASW Use graphs_scatter/Dot 1
Scatterplot in PASW Choose “simple scatter” 1
Scatterplot in PASW Choose the variables (here I’ve used a 3 rd variable too – you’ll see why in a moment) 1
Scatterplot in PASW As you can see, there are rather different values for males and females 1
Bottom line First step should always be to plot the data and eyeball it...following is an example of what can happen when you do. 1
low high $$ amount Expected distribution of agent-paid claims (State Farm) One use of Frequency Distribution & Skewness 1
low high $$ amount f Observed distribution of an agent-paid claims (hmmm…) One use of Frequency Distribution & Skewness 1 2 3