Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.

Similar presentations


Presentation on theme: "Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU."— Presentation transcript:

1 Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU

2 2 Outline for Data Presentation Types of Numerical Data Tables –Frequency Distributions –Relative Frequency Graphs –Bar/Pie Charts –Histograms –Frequency Polygons –Stem & Leaf Plot 2 – One-Way Scatter Plots – Box Plots – Two-Way Scatter Plots – Line Graphs

3 3 Outline for Numerical Summary Measures Measures of Central Tendency Mean / Median / Mode Measures of Dispersion –Range –Interquartile Range –Variance and Standard Deviation –Coefficient of Variation Grouped Data –Grouped Mean / Grouped Variance 3

4 4 Types of Numerical Data Nominal –Dichotomous/binary: gender (1=females and 0=males) –Categorical: blood type (1=O, 2=A, 3=B, and 4=AB) or race/ethnicity Ordinal –Level of severity: 1=fatal, 2=severe, 3=moderate, and 4=minor –Liker’s scale: Level of “agree”: 1=the least agree to 5=the most agree Ranked –Leading causes of death/cancer in Taiwan

5 5 Interval scale –Temperature (  C) Ratio scale –Body height, weight, concentration of white blood cell

6 6

7 7 Tables for Continuous Data

8 8 Guidelines Closed ends would be better than open ends in constructing frequency table, as they provide more information. Intervals should be comprehensive but must be mutually exclusive. Frequency tables for continuous data are somewhat misleading……………

9 9

10 10 Comment Grouping a continuous variable might not be biologically plausible. For example, in MCH studies, maternal ages are normally categorized into =35. Women aged 29 would be more similar to women aged 30 in physiological aspects than to those 25 years old.

11 11 No Concern for Tabulating Categorical Data

12 12 Bar Chart

13 13 Pie Chart

14 14 Histogram

15 15 How About This One?

16 16 Frequency Polygons

17 17

18 18

19 19 Stem-and-Leaf Plots

20 20 Comment Does preserve individual measure information, so not useful for large data sets Stem is first digit(s) of measurements, leaves are last digit of measurements Most useful for two digit numbers, more cumbersome for three+ digits 20: X 30: XXX 40: XXXX 50: XX 60: X 2* | 1 3* | 244 4* | 2468 5* | 26 6* | 4 Stem Leaf

21 21 One-Way Scatter Plots

22 22 Two-Way Scatter Plots

23 23 Box Plots

24 24 Comment Descriptive method to convey information about measures of location and dispersion –Box-and-whisker plots Construction of box plot –Box is IQR –Line at median –Whiskers at smallest and largest observations –Other conventions can be used, especially to represent extreme values

25 25 Good for Making Comparisons

26 26 Line Graphs

27 27

28 28 Summary In practice, descriptive statistics play a major role –Always the first 1-2 tables/figures in a paper –Statistician needs to know about each variable before deciding how to analyze to answer research questions In any analysis, 90% of the effort goes into setting up the data –Descriptive statistics are part of that 90%

29 29 Measures of Central Tendency Mean –Arithmetic mean –Geometric mean Median Mode

30 30 Suppose we have N measurements of a particular variable in a population.We denote these N measurements as: X 1, X 2, X 3,…,X N where X 1 is the first measurement, X 2 is the second, etc. Definition More accurately called the arithmetic mean, it is defined as the sum of measures observed divided by the number of observations. Arithmetic Mean (population)

31 Arithmetic Mean Probably most common of the measures of central tendency –A.K.A. ‘Average’ Definition –Normal distribution, although we tend to use it regardless of distribution –μ for population mean 31

32 Comment Weakness –Influenced by extreme values Translations –Additive –Multiplicative 32

33 Geometric Mean Used to describe data with an extreme skewness to the right –Ex., Laboratory data: lipid measurements Definition –Antilog of the mean of the log x i 33

34 34 Used to calculate mean of a log-normal distribution Definition –Antilog of the mean of the log x i

35 35

36 Median Frequently used if there are extreme values in a distribution or if the distribution is non-normal Definition –That value that divides the ‘ordered array’ into two equal parts If an odd number of observations, the median will be the (n+1)/2 observation –Ex.: Median of 11 observations is the 6th observation If an even number of observations, the median will be the midpoint between the middle two observations –Ex.: Median of 12 observations is the midpoint between 6th and 7th 36

37 Mode Not used very frequently in practice Definition –Value that occurs most frequently in data set If all values different, no mode May be more than one mode –Bimodal or multimodal 37

38 38

39 Why Measures of Dispersion? 39

40 Range 40

41 Inter-Quartile Range 41

42 Percentiles and Quartiles Definition of percentiles –Given a set of n observations x 1, x 2,…, x n, the pth percentile P is value of X such that p percent or less of the observations are less than P and (100-p) percent or less are greater than P –P 10 indicates 10th percentile, etc. Definition of quartiles –First quartile is P 25 –Second quartile is median or P 50 –Third quartile is P 75 42

43 Variance and Standard Deviation (population) Suppose we have N measurements of a particular variable in a population: X 1, X 2, X 3,…,X N, The mean is μ, as, we define: as variance as standard deviation 43

44 Variance and Standard Deviation (sample) Suppose we have n measurements of a particular variable in a sample: x1, x2, x3,…,xn, The mean is, we define:  as sample variance  as standard deviation 44

45 Why n-1 for Sample Variance and SD ? Population=[1,2,3]  =2,  2 =0.667 n=2, repeated sampling 1 [1,1] 00 2 [1,2]0.50.25 3 [1,3]21 4 [2,1]0.50.25 5 [2,2]00 6 [2,3]0.50.25 7 [3,1]21 8 [3,2]0.50.25 9 [3,3]00 45 Average=0.667Average=0.333

46 46 s is expected to be an unbiased estimate of 

47 Coefficient of Variation Relative variation rather than absolute variation such as standard deviation Definition of C.V. 47

48 Comment Useful in comparing variation between two distributions –Used particularly in comparing laboratory measures to identify those determinations with more variation –Also used in QC analyses for comparing observers 48

49 A Class of Students Body weight: Mean=60 kg; SD=5 kg Body height: Mean=170 cm; SD=10 cm Which variable is with greater variation? Weight or Height ? SD, 10cm>5kg ??? CV, 10 cm/170 cm<5 kg/60 kg CV is the only descriptive statistic without unit 49

50 Software Statistical software –SAS –SPSS –Stata –Minitab Graphical software –Sigmaplot –Power Point –Excel 50


Download ppt "Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU."

Similar presentations


Ads by Google