Download presentation
Presentation is loading. Please wait.
Published byAlbert Marsh Modified over 8 years ago
1
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU
2
2 Outline for Data Presentation Types of Numerical Data Tables –Frequency Distributions –Relative Frequency Graphs –Bar/Pie Charts –Histograms –Frequency Polygons –Stem & Leaf Plot 2 – One-Way Scatter Plots – Box Plots – Two-Way Scatter Plots – Line Graphs
3
3 Outline for Numerical Summary Measures Measures of Central Tendency Mean / Median / Mode Measures of Dispersion –Range –Interquartile Range –Variance and Standard Deviation –Coefficient of Variation Grouped Data –Grouped Mean / Grouped Variance 3
4
4 Types of Numerical Data Nominal –Dichotomous/binary: gender (1=females and 0=males) –Categorical: blood type (1=O, 2=A, 3=B, and 4=AB) or race/ethnicity Ordinal –Level of severity: 1=fatal, 2=severe, 3=moderate, and 4=minor –Liker’s scale: Level of “agree”: 1=the least agree to 5=the most agree Ranked –Leading causes of death/cancer in Taiwan
5
5 Interval scale –Temperature ( C) Ratio scale –Body height, weight, concentration of white blood cell
6
6
7
7 Tables for Continuous Data
8
8 Guidelines Closed ends would be better than open ends in constructing frequency table, as they provide more information. Intervals should be comprehensive but must be mutually exclusive. Frequency tables for continuous data are somewhat misleading……………
9
9
10
10 Comment Grouping a continuous variable might not be biologically plausible. For example, in MCH studies, maternal ages are normally categorized into =35. Women aged 29 would be more similar to women aged 30 in physiological aspects than to those 25 years old.
11
11 No Concern for Tabulating Categorical Data
12
12 Bar Chart
13
13 Pie Chart
14
14 Histogram
15
15 How About This One?
16
16 Frequency Polygons
17
17
18
18
19
19 Stem-and-Leaf Plots
20
20 Comment Does preserve individual measure information, so not useful for large data sets Stem is first digit(s) of measurements, leaves are last digit of measurements Most useful for two digit numbers, more cumbersome for three+ digits 20: X 30: XXX 40: XXXX 50: XX 60: X 2* | 1 3* | 244 4* | 2468 5* | 26 6* | 4 Stem Leaf
21
21 One-Way Scatter Plots
22
22 Two-Way Scatter Plots
23
23 Box Plots
24
24 Comment Descriptive method to convey information about measures of location and dispersion –Box-and-whisker plots Construction of box plot –Box is IQR –Line at median –Whiskers at smallest and largest observations –Other conventions can be used, especially to represent extreme values
25
25 Good for Making Comparisons
26
26 Line Graphs
27
27
28
28 Summary In practice, descriptive statistics play a major role –Always the first 1-2 tables/figures in a paper –Statistician needs to know about each variable before deciding how to analyze to answer research questions In any analysis, 90% of the effort goes into setting up the data –Descriptive statistics are part of that 90%
29
29 Measures of Central Tendency Mean –Arithmetic mean –Geometric mean Median Mode
30
30 Suppose we have N measurements of a particular variable in a population.We denote these N measurements as: X 1, X 2, X 3,…,X N where X 1 is the first measurement, X 2 is the second, etc. Definition More accurately called the arithmetic mean, it is defined as the sum of measures observed divided by the number of observations. Arithmetic Mean (population)
31
Arithmetic Mean Probably most common of the measures of central tendency –A.K.A. ‘Average’ Definition –Normal distribution, although we tend to use it regardless of distribution –μ for population mean 31
32
Comment Weakness –Influenced by extreme values Translations –Additive –Multiplicative 32
33
Geometric Mean Used to describe data with an extreme skewness to the right –Ex., Laboratory data: lipid measurements Definition –Antilog of the mean of the log x i 33
34
34 Used to calculate mean of a log-normal distribution Definition –Antilog of the mean of the log x i
35
35
36
Median Frequently used if there are extreme values in a distribution or if the distribution is non-normal Definition –That value that divides the ‘ordered array’ into two equal parts If an odd number of observations, the median will be the (n+1)/2 observation –Ex.: Median of 11 observations is the 6th observation If an even number of observations, the median will be the midpoint between the middle two observations –Ex.: Median of 12 observations is the midpoint between 6th and 7th 36
37
Mode Not used very frequently in practice Definition –Value that occurs most frequently in data set If all values different, no mode May be more than one mode –Bimodal or multimodal 37
38
38
39
Why Measures of Dispersion? 39
40
Range 40
41
Inter-Quartile Range 41
42
Percentiles and Quartiles Definition of percentiles –Given a set of n observations x 1, x 2,…, x n, the pth percentile P is value of X such that p percent or less of the observations are less than P and (100-p) percent or less are greater than P –P 10 indicates 10th percentile, etc. Definition of quartiles –First quartile is P 25 –Second quartile is median or P 50 –Third quartile is P 75 42
43
Variance and Standard Deviation (population) Suppose we have N measurements of a particular variable in a population: X 1, X 2, X 3,…,X N, The mean is μ, as, we define: as variance as standard deviation 43
44
Variance and Standard Deviation (sample) Suppose we have n measurements of a particular variable in a sample: x1, x2, x3,…,xn, The mean is, we define: as sample variance as standard deviation 44
45
Why n-1 for Sample Variance and SD ? Population=[1,2,3] =2, 2 =0.667 n=2, repeated sampling 1 [1,1] 00 2 [1,2]0.50.25 3 [1,3]21 4 [2,1]0.50.25 5 [2,2]00 6 [2,3]0.50.25 7 [3,1]21 8 [3,2]0.50.25 9 [3,3]00 45 Average=0.667Average=0.333
46
46 s is expected to be an unbiased estimate of
47
Coefficient of Variation Relative variation rather than absolute variation such as standard deviation Definition of C.V. 47
48
Comment Useful in comparing variation between two distributions –Used particularly in comparing laboratory measures to identify those determinations with more variation –Also used in QC analyses for comparing observers 48
49
A Class of Students Body weight: Mean=60 kg; SD=5 kg Body height: Mean=170 cm; SD=10 cm Which variable is with greater variation? Weight or Height ? SD, 10cm>5kg ??? CV, 10 cm/170 cm<5 kg/60 kg CV is the only descriptive statistic without unit 49
50
Software Statistical software –SAS –SPSS –Stata –Minitab Graphical software –Sigmaplot –Power Point –Excel 50
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.