Univariate Visualization CMSC 120: Visualizing Information 2/21/08
Types of Data Qualitative: pertaining to fundamental or distinctive characteristics Nominal: unordered (e.g., names, types) Ordinal: ordered (e.g., cold, warm, hot) Quantitative: pertaining to an amount of anything Discrete: isolated intervals Continuous: unbroken, immediate connection
Univariate Data A single attribute
Weather Conditions: 2/17/08
Univariate Data A single attribute Characterize Observations Temperature: quantitative Condition: qualitative Characterize Observations Number Type Similarity
The Raw Data: A Dot Plot n ≤ 20 Distance between individual points Emphasize clusters, gaps, outliers Reveal frequency of each observation
Frequency Table Groups observations by class Quantitative: an interval or part of the range of the sample Qualitative: a potential value Frequency: number of observations that fall into a class Relative Frequency: frequency / sample size
Frequency Table Clear 5 17 % Mostly Cloudy 1 3 % Partly Cloudy 3 10 % Condition Frequency Relative Frequency Clear 5 17 % Mostly Cloudy 1 3 % Partly Cloudy 3 10 % Overcast 16 55 % Light Rain 4 14 %
Frequency Table 25-30 7 24% 30-35 1 3% 35-40 3 10% 40-45 45-50 13 45% Temperature Frequency Relative Frequency 25-30 7 24% 30-35 1 3% 35-40 3 10% 40-45 45-50 13 45% 50-55 0 % 55-60 3 % 60-65
Stem and Leaf Plots Stem Leaf 2 5566789 3 1567 4 02555566677788999 5 8 Temperature 25 26 27 31 36 40 42 Stem Leaf 2 5 Stem Leaf 2 5566789 3 1567 4 02555566677788999 5 8 6 Stem Leaf 2 5567 Separate each number into a stem (class) and a leaf Group numbers with the same stems
Pie Charts Useful for qualitative data Must sum to 100%
Histograms Pictorial representation of a Frequency Table Set of boxes whose area represents relative frequency of observations per class Total Area of all boxes = 100% Shape of histogram determined by box Number = number of classes Width = class interval Height
Histogram
Histogram
Patterns Outliers: observations well away from main body of data Number of peaks (modes): most popular values Abrupt Changes
Shape Central Values: where data appear to be centered Mode Mean Central Values: where data appear to be centered Spread: how spread out the points are Symmetry (Skew)
How to Lie: Aggregation Process of putting data into groups Allows user to compare among groups Hides differences between groups Too little: noise of individual data overwhelms overall pattern Too much: important patterns are hidden within groups
Interval Size = 7 Degrees
Interval Size = 14 Degrees
Shape of Shell Aperture
Shape of Shell Aperture
Shape of Shell Aperture
Shape of Shell Aperture
Shape of Shell Aperture
The 5 Number Summary Continuous, Quantitative Data Order data from lowest value to highest Minimum: lowest value Lower Quartile: cuts off ¼ of the data Median: middle value Upper Quartile: cuts off ¾ of the data Maximum: highest value
Minimum = 25 Lower Quartile = 30.9 Median = 45 Upper Quartile = 46.9 26.1 27 28 28.9 30.9 35.6 37 39.9 42.1 44.6 45 46 46.4 46.9 48 48.2 48.9 57.9 60.1 Minimum = 25 Lower Quartile = 30.9 Median = 45 Upper Quartile = 46.9 Maximum = 60.1
Box and Whisker Plot Maximum = 60.1 Outlier Largest Non-Outlier Upper Quartile = 46.9 Median = 45 50% of Data Lower Quartile = 30.9 Smallest Non-Outlier Minimum = 25
Shell Shape