Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Introduction to Summary Statistics
IB Math Studies – Topic 6 Statistics.
Introduction to Summary Statistics
Descriptive Statistics
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
Descriptive Statistics
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chapter Two Descriptive Statistics McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Descriptive statistics (Part I)
Statistical Techniques in Hospital Management QUA 537
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Descriptive Statistics Roger L. Brown, Ph.D. Medical Research Consulting Middleton, WI Online Course #1.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Practice 1 Tao Yuchun Medical Statistics
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Chapter 2 Describing Data: Numerical
Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures.
COMPLETE BUSINESS STATISTICS
Descriptive Statistics ( )
Methods for Describing Sets of Data
Measure of the Central Tendency For Grouped data
Doc.RNDr.Iveta Bedáňová, Ph.D.
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Descriptive Statistics
Introduction to Statistics
Statistics: The Interpretation of Data
Numerical Descriptive Measures
Probability and Statistics
Biostatistics Lecture (2).
Presentation transcript:

Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU

2 Outline for Data Presentation Types of Numerical Data Tables –Frequency Distributions –Relative Frequency Graphs –Bar/Pie Charts –Histograms –Frequency Polygons –Stem & Leaf Plot 2 – One-Way Scatter Plots – Box Plots – Two-Way Scatter Plots – Line Graphs

3 Outline for Numerical Summary Measures Measures of Central Tendency Mean / Median / Mode Measures of Dispersion –Range –Interquartile Range –Variance and Standard Deviation –Coefficient of Variation Grouped Data –Grouped Mean / Grouped Variance 3

4 Types of Numerical Data Nominal –Dichotomous/binary: gender (1=females and 0=males) –Categorical: blood type (1=O, 2=A, 3=B, and 4=AB) or race/ethnicity Ordinal –Level of severity: 1=fatal, 2=severe, 3=moderate, and 4=minor –Liker’s scale: Level of “agree”: 1=the least agree to 5=the most agree Ranked –Leading causes of death/cancer in Taiwan

5 Interval scale –Temperature (  C) Ratio scale –Body height, weight, concentration of white blood cell

6

7 Tables for Continuous Data

8 Guidelines Closed ends would be better than open ends in constructing frequency table, as they provide more information. Intervals should be comprehensive but must be mutually exclusive. Frequency tables for continuous data are somewhat misleading……………

9

10 Comment Grouping a continuous variable might not be biologically plausible. For example, in MCH studies, maternal ages are normally categorized into =35. Women aged 29 would be more similar to women aged 30 in physiological aspects than to those 25 years old.

11 No Concern for Tabulating Categorical Data

12 Bar Chart

13 Pie Chart

14 Histogram

15 How About This One?

16 Frequency Polygons

17

18

19 Stem-and-Leaf Plots

20 Comment Does preserve individual measure information, so not useful for large data sets Stem is first digit(s) of measurements, leaves are last digit of measurements Most useful for two digit numbers, more cumbersome for three+ digits 20: X 30: XXX 40: XXXX 50: XX 60: X 2* | 1 3* | 244 4* | * | 26 6* | 4 Stem Leaf

21 One-Way Scatter Plots

22 Two-Way Scatter Plots

23 Box Plots

24 Comment Descriptive method to convey information about measures of location and dispersion –Box-and-whisker plots Construction of box plot –Box is IQR –Line at median –Whiskers at smallest and largest observations –Other conventions can be used, especially to represent extreme values

25 Good for Making Comparisons

26 Line Graphs

27

28 Summary In practice, descriptive statistics play a major role –Always the first 1-2 tables/figures in a paper –Statistician needs to know about each variable before deciding how to analyze to answer research questions In any analysis, 90% of the effort goes into setting up the data –Descriptive statistics are part of that 90%

29 Measures of Central Tendency Mean –Arithmetic mean –Geometric mean Median Mode

30 Suppose we have N measurements of a particular variable in a population.We denote these N measurements as: X 1, X 2, X 3,…,X N where X 1 is the first measurement, X 2 is the second, etc. Definition More accurately called the arithmetic mean, it is defined as the sum of measures observed divided by the number of observations. Arithmetic Mean (population)

Arithmetic Mean Probably most common of the measures of central tendency –A.K.A. ‘Average’ Definition –Normal distribution, although we tend to use it regardless of distribution –μ for population mean 31

Comment Weakness –Influenced by extreme values Translations –Additive –Multiplicative 32

Geometric Mean Used to describe data with an extreme skewness to the right –Ex., Laboratory data: lipid measurements Definition –Antilog of the mean of the log x i 33

34 Used to calculate mean of a log-normal distribution Definition –Antilog of the mean of the log x i

35

Median Frequently used if there are extreme values in a distribution or if the distribution is non-normal Definition –That value that divides the ‘ordered array’ into two equal parts If an odd number of observations, the median will be the (n+1)/2 observation –Ex.: Median of 11 observations is the 6th observation If an even number of observations, the median will be the midpoint between the middle two observations –Ex.: Median of 12 observations is the midpoint between 6th and 7th 36

Mode Not used very frequently in practice Definition –Value that occurs most frequently in data set If all values different, no mode May be more than one mode –Bimodal or multimodal 37

38

Why Measures of Dispersion? 39

Range 40

Inter-Quartile Range 41

Percentiles and Quartiles Definition of percentiles –Given a set of n observations x 1, x 2,…, x n, the pth percentile P is value of X such that p percent or less of the observations are less than P and (100-p) percent or less are greater than P –P 10 indicates 10th percentile, etc. Definition of quartiles –First quartile is P 25 –Second quartile is median or P 50 –Third quartile is P 75 42

Variance and Standard Deviation (population) Suppose we have N measurements of a particular variable in a population: X 1, X 2, X 3,…,X N, The mean is μ, as, we define: as variance as standard deviation 43

Variance and Standard Deviation (sample) Suppose we have n measurements of a particular variable in a sample: x1, x2, x3,…,xn, The mean is, we define:  as sample variance  as standard deviation 44

Why n-1 for Sample Variance and SD ? Population=[1,2,3]  =2,  2 =0.667 n=2, repeated sampling 1 [1,1] 00 2 [1,2] [1,3]21 4 [2,1] [2,2]00 6 [2,3] [3,1]21 8 [3,2] [3,3]00 45 Average=0.667Average=0.333

46 s is expected to be an unbiased estimate of 

Coefficient of Variation Relative variation rather than absolute variation such as standard deviation Definition of C.V. 47

Comment Useful in comparing variation between two distributions –Used particularly in comparing laboratory measures to identify those determinations with more variation –Also used in QC analyses for comparing observers 48

A Class of Students Body weight: Mean=60 kg; SD=5 kg Body height: Mean=170 cm; SD=10 cm Which variable is with greater variation? Weight or Height ? SD, 10cm>5kg ??? CV, 10 cm/170 cm<5 kg/60 kg CV is the only descriptive statistic without unit 49

Software Statistical software –SAS –SPSS –Stata –Minitab Graphical software –Sigmaplot –Power Point –Excel 50