Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3: Frequency Distributions

Similar presentations


Presentation on theme: "Chapter 3: Frequency Distributions"— Presentation transcript:

1 Chapter 3: Frequency Distributions
4/17/2017 Chapter 3: Frequency Distributions April 17 Basic Biostatistics

2 In Chapter 3: 3.1 Stemplots 3.2 Frequency Tables
4/17/2017 In Chapter 3: 3.1 Stemplots 3.2 Frequency Tables 3.3 Additional Frequency Charts Basic Biostatistics

3 You can observe a lot by looking – Yogi Berra
Stemplots You can observe a lot by looking – Yogi Berra Start by exploring the data with Exploratory Data Analysis (EDA) A popular univariate EDA technique is the stem-and-leaf plot The stem of the stemplot is an number-line (axis) Each leaf represents a data point

4 Stemplot: Illustration
10 ages (data sequenced as an ordered array) Draw the stem to cover the range 5 to 52: 0| 1| 2| 3| 4| 5| ×10  axis multiplier Divide each data point into a stem-value (in this example, the tens place) and leaf-value (the ones-place, in this example) Place leaves next to their stem value Example of a leaf: 21 (plotted) 1

5 Stemplot illustration continued …
Plot all data points in rank order: 0|5 1|1 2| |0 4|2 5|02 ×10 Here is the plot horizontally Rotated stemplot

6 Interpreting Distributions
Shape Central location Spread

7 Shape “Shape” refers to the distributional pattern
Here’s the silhouette of our data X X X X X X X X X X Mound-shaped, symmetrical, no outliers Do not “over-interpret” plots when n is small

8 Shape (cont.) Consider this large data set of IQ scores
An density curve is superimposed on the graph

9 Examples of Symmetrical Shapes

10 Examples of Asymmetrical shapes
Chapter 3 4/17/2017 Examples of Asymmetrical shapes Basic Biostatistics

11 Modality (no. of peaks)

12 Kurtosis (steepness) Kurtosis is not be easily judged by eye
 fat tails Mesokurtic (medium) Platykurtic (flat)  skinny tails Leptokurtic (steep) Kurtosis is not be easily judged by eye

13 Gravitational Center (Mean)
Gravitational center ≡ arithmetic mean “Eye-ball method”  visualize where plot would balance on see-saw “ around 30 (takes practice) Arithmetic method = sum values and divide by n sum = 290 n = 10 mean = 290 / 10 = 29 ^ Grav.Center

14 Central location: Median
Ordered array: The median has depth (n + 1) ÷ 2 n = 10, median’s depth = (10+1) ÷ 2 = 5.5 → falls between 27 and 28 When n is even, average adjacent values  Median = 27.5

15 Spread: Range For now, report the range (minimum and maximum values)
Current data range is “5 to 52” The range is the easiest but not the best way to describe spread (better methods described later)

16 Stemplot – Second Example
Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 Stem = ones-place Leaves = tenths-place Truncate extra digit (e.g., 1.47  1.4) |1|4 |2|03 |3|4779 |4|4 (×1) Center: median between 3.4 & 3.7 (underlined) Spread: 1.4 to 4.4 Shape: mound, no outliers

17 Third Illustrative Example (n = 25)
Data: 14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28, 29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38 Regular stemplot: |1|4789 |2| |3| ×10 Too squished to see shape

18 Third Illustration; Split Stem
Split stem-values into two ranges, e.g., first “1” holds leaves between 0 to 4, and second “1” will holds leaves between 5 to 9 Split-stem |1|4 |1|789 |2|2234 |2|66789 |3| |3|5678 ×10 Negative skew now evident)

19 How many stem-values? Start with between 4 and 12 stem-values
Then, use trial and error using different stem multipliers and splits → use plot that shows shape most clearly

20 Fourth Example: n = 53 body weights
Data range from 100 to 260 lbs:

21 Data range from 100 to 260 lbs: ×100 axis multiplier  only two stem-values (1×100 and 2×100)  too few ×100 axis-multiplier w/ split stem  4 stem values  might be OK(?) ×10 axis-multiplier  16 stem values next slide

22 Fourth Stemplot Example (n = 53)
Chapter 3 4/17/2017 Fourth Stemplot Example (n = 53) 10|0166 11|009 12| 13|00359 14|08 15|00257 16|555 17|000255 18| 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 (×10) Shape: Positive skew high outlier (260) Central Location: L(M) = (53 + 1) / 2 = 27 Median = 165 (underlined) Spread: from 100 to 260 The student should construct a stem & leaf plot here using the first two digits as the stem and the last digit as the leaf. The shape of the stem & leaf plot should look similar to the bar graph shown on an upcoming slide. Basic Biostatistics

23 Quintuple-Split Stem Values
Chapter 3 4/17/2017 Quintuple-Split Stem Values 1*| 1t| 1f| 1s| 1.| 2*|0111 2t|2 2f| 2s|6 (×100) Codes for stem values: * for leaves 0 and 1 t for leaves two and three f for leaves four and five s for leaves six and seven . for leaves eight and nine For example, 120 is: 1t|2 (x100) The student should construct a stem & leaf plot here using the first two digits as the stem and the last digit as the leaf. The shape of the stem & leaf plot should look similar to the bar graph shown on an upcoming slide. Basic Biostatistics

24 SPSS Stemplot, n = 654 Frequency counts 3 . 0 means 3.0 years
Frequency Stem & Leaf Extremes (>=18) Stem width: 1 Each leaf: case(s) means 3.0 years Because n large, each leaf represents 2 observations

25 Frequency Table Frequency ≡ count Relative frequency ≡ proportion
AGE   |  Freq  Rel.Freq  Cum.Freq.  3    |     2    0.3%     0.3%  4    |     9    1.4%     1.7%  5    |    28    4.3%     6.0%  6    |    37    5.7%    11.6%  7    |    54    8.3%    19.9%  8    |    85   13.0%    32.9%  9    |    94   14.4%    47.2% 10    |    81   12.4%    59.6% 11    |    90   13.8%    73.4% 12    |    57    8.7%    82.1% 13    |    43    6.6%    88.7% 14    |    25    3.8%    92.5% 15    |    19    2.9%    95.4% 16    |    13    2.0%    97.4% 17    |     8    1.2%    98.6% 18    |     6    0.9%    99.5% 19    |     3    0.5%   100.0% Total |   654  100.0% Frequency ≡ count Relative frequency ≡ proportion Cumulative [relative] frequency ≡ proportion less than or equal to current value

26 Class Intervals When data sparse, group data into class intervals
Classes intervals can be uniform or non-uniform Use end-point convention, so data points fall into unique intervals: include lower boundary, exclude upper boundary (next slide)

27 Class Intervals Freq Table
Data: Class Freq Relative Freq. (%) Cumulative Freq (%) 0 – 9 1 10% 10 – 19 10 20 20 – 29 4 40 60 30 – 39 70 40 – 44 80 50 – 59 2 100% Total --

28 For a quantitative measurement only.
Histogram For a quantitative measurement only. Bars touch.

29 Bar Chart For categorical and ordinal measurements and continuous data in non-uniform class intervals  bars do not touch.


Download ppt "Chapter 3: Frequency Distributions"

Similar presentations


Ads by Google