Ogive, Stem and Leaf plot & Crosstabulation
Ogive n An ogive is a graph of a cumulative distribution.. n The data values are shown on the horizontal axis. n Shown on the vertical axis are the: cumulative frequencies, or cumulative relative frequencies, or cumulative percent frequencies
Ogive The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines.
Parts Parts Cost ($) Parts Parts Cost ($) Cumulative Percent Frequency (89.5, 76) Ogive with Cumulative Percent Frequencies Cumulative Percent Frequencies Example of an Ogive
Stem and Leaf Plots 1. Sort data *** 2. Round data (if necessary) 3. Create TWO new columns (stem and leaf) 4. Put “stem” in one column and “leaves” in another. 5. Format the leaves column to be left-aligned.
What we have done Summary of variables Qualitative: Numeric: Frequency, relative frequency, percentage frequency, cumulative frequency, cumulative relative frequency, cumulative Percentage Graphical: Bar (column) chart, pie chart
What we have done II Quantitative: Numeric: Frequency, relative frequency, percentage frequency, cumulative frequency, cumulative relative frequency, cumulative Percentage Graphical: histogram, stem and leaf, Ogive, boxplot
Another thing of interest to statisticians Relationship between variables Variables: Quantitative Qualitative
Relationship between variables Qualitative vs. qualitative: Crosstabulation Qualitative vs. quantitative: ANOVA etc. Quantitative vs. quantitative: Regression etc.
Example of Crosstab Sum of countfactor b factor a12345Grand Total Grand Total
What crosstab tells us? Cross Tabs: a tabular summary of data for two variables Marginal Distributions/Probabilities: totals/probabilities in the margins of the cross tabulation.
An example that makes more sense Sum of CountWin Ginobli Played N YTotal N Y Total285482
Marginal Distributions Ginobli’s game play distribution Played: 44; Missed: 38 Spurs’ season breakdown Win: 54; Lose: 28
Marginal Probabilities Ginobli’s chance of playing: 44/82 Spurs’ winning percentage: 54/82 Row (column ) total / grand total
Some other Probabilities Conditional Probability Spurs’ winning percentage when Ginobli played. 32/44 Cell count / row (column ) total Joint Probability: cell count /grand total E.g. The percentage of games that Spurs won and Ginobli played.
Crosstab
Example cont.
Components of the table Column1Column2Column3Total Row 1Cell count Row 1 total Row 2Cell count Row 2 total Row 3Cell count Row 3 total Total Column 1 total Grand Total
Probabilities From Crosstab Marginal, joint and conditional Marginal probability row(column) total/grand total Joint probability cell count / grand total Conditional probability Cell count / row (column) total
What is the percentage of all patients who received a CHEAP positive test result? Is this a joint, marginal, or conditional percentage? Marginal: 37.0%
Out of all the patients given the CHEAP test, what is the percentage of false negatives? Is this a joint, marginal, or conditional percentage? Joint, 2% (this is where CHEAP is negative, but Actual SFI is positive)
What is the percentage of subjects diagnosed as positive by BOTH tests? Is this a joint, marginal, or conditional percentage? Joint: 30%.
What is the percentage of correct diagnosis? =(30+61)/100 = 91% That is correct diagnosis of positive AND negative.
If someone gets the test result and it is “positive”, what is the chance that this person really has the disease. 30/37=81% (conditional) That means there is still 19% chance that this person does not have the disease.
Check this one out! Homicide convictions in the state of Florida between 1976 and Did convicted person get death sentence? Is there a racial bias? YESNOTotal (% YES) White % Black % Total %
The other side of the story ii. Table for those cases involving white victims YESNOTotal (% YES) White % Black % Total %
The other side of the story i. Table for those cases involving black victims YESNOTotal (% YES) White029 0% Black % Total %
This is what we call Simpson’s Paradox in statistics Simpson’s paradox refers to the reversal in the direction of an X versus Y relationship when controlling for a third variable Z.
Another Example Numbers of flights on time and delayed for two airlines at five airports in June Alaska AirlineAmerican West Airline On TimeDelayedDelay %On TimeDelayedDelay % % %
Another Example (contd) Alaska Airline American West Airline On Time Delay ed Delay % On Time Delay ed Delay % L.A % % Phoenix % % San Diego % % San Francisco % % Seattle % %