Data Handling II: Describing and Depicting your Data Dr Yanzhong Wang Lecturer in Medical Statistics Division of Health and Social Care Research King's.

Slides:



Advertisements
Similar presentations
Chapter 1 review “Exploring Data”
Advertisements

Descriptive Statistics
Statistics.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
© 2002 Prentice-Hall, Inc.Chap 3-1 Basic Business Statistics (8 th Edition) Chapter 3 Numerical Descriptive Measures.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
Chapter 14 Analyzing Quantitative Data. LEVELS OF MEASUREMENT Nominal Measurement Nominal Measurement Ordinal Measurement Ordinal Measurement Interval.
Intro to Descriptive Statistics
Introduction to Educational Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Coefficient of Variation
© 2003 Prentice-Hall, Inc.Chap 3-1 Business Statistics: A First Course (3 rd Edition) Chapter 3 Numerical Descriptive Measures.
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
Today: Central Tendency & Dispersion
Measures of Central Tendency CJ 526 Statistical Analysis in Criminal Justice.
Quiz 2 Measures of central tendency Measures of variability.
Statistics for the Behavioral Sciences Second Edition Chapter 4: Central Tendency and Variability iClicker Questions Copyright © 2012 by Worth Publishers.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3 Statistical Concepts.
Chapters 1 & 2 Displaying Order; Central Tendency & Variability Thurs. Aug 21, 2014.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Modified by ARQ, from © 2002 Prentice-Hall.Chap 3-1 Numerical Descriptive Measures Chapter %20ppts/c3.ppt.
Basic Statistics. Scales of measurement Nominal The one that has names Ordinal Rank ordered Interval Equal differences in the scores Ratio Has a true.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Describing Data Statisticians describe a set of data in two general ways. Statisticians describe a set of data in two general ways. –First, they compute.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Introduction to Statistics Santosh Kumar Director (iCISA)
Math 145 September 11, Recap  Individuals – are the objects described by a set of data. Individuals may be people, but they may also be animals.
Welcome to MM570 Applies Statistics for Psychology Unit 2 Seminar Dr. Bob Lockwood.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
LIS 570 Summarising and presenting data - Univariate analysis.
Applied Quantitative Analysis and Practices LECTURE#07 By Dr. Osman Sadiq Paracha.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
Why do we analyze data?  To determine the extent to which the hypothesized relationship does or does not exist.  You need to find both the central tendency.
Descriptive Statistics(Summary and Variability measures)
Economics 111Lecture 7.2 Quantitative Analysis of Data.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Applied Quantitative Analysis and Practices LECTURE#05 By Dr. Osman Sadiq Paracha.
© 1999 Prentice-Hall, Inc. Chap Measures of Central Location Mean, Median, Mode Measures of Variation Range, Variance and Standard Deviation Measures.
Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Statistical Methods Michael J. Watts
Statistical Methods Michael J. Watts
Measures of Central Tendency
Module 6: Descriptive Statistics
Description of Data (Summary and Variability measures)
Univariate Descriptive Statistics
Descriptive Statistics
Measure of Central Tendency
BUSINESS MATHEMATICS & STATISTICS.
Shape, Center, Spread.
Math 341 January 24, 2007.
Numerical Descriptive Measures
Biostatistics Lecture (2).
Presentation transcript:

Data Handling II: Describing and Depicting your Data Dr Yanzhong Wang Lecturer in Medical Statistics Division of Health and Social Care Research King's College London Drug Development Statistics & Data Management

2 Types of data Quantitative data – continuous, discrete – distributions may symmetric or skewed Qualitative (categorical) data – binary – nominal, ordinal

3 Long tail to leftLong tail to right Skewed Distributions

Symmetric Distribution

5 Summary statistics ‘Where the data are’ - location – mean, median, mode, geometric mean Used to describe baseline data and main outcomes ‘How variable the data are’ - spread – standard deviation, variance, range, interquartile range, 95% range Needed (primarily) to describe baseline data in RCT and cohort study

6 Definition of the Mean The mean of a sample of values is the arithmetic average and is determined by dividing the sum of the values by the number of the values.

7 Definition of the Median The median is the middle value. not affected by skewness and outliers, but less precise than mean theoretically.

Ordered Blood Glucose Values

Definition of the Mode The mode is the most frequent value. 9

Ordered Blood Glucose Values 10

Blood glucose (mmol/litre) Count Arithmetic Mean - outlier prone Mode - not necessarily central (categorical data) Median - only uses relative magnitudes Location = Central Tendency 11

Relation of mean, median and mode If distribution is unimodal (has only one mode) then: Mean=median=mode for symmetric distribution. Mean>median>mode for positively skewed distribution. Mean<median<mode for negatively skewed distribution. 12

Serum Triglyceride Levels Count Serum Triglyceride Levels from Cord Blood of 282 Babies 13

log(Serum Triglyceride) Levels count Log(Serum Triglyceride Levels) from Cord Blood of 282 Babies 14

Definition of the Geometric Mean The geometric mean of a sample of n values is determined by multiplying all the values together and taking the nth root (for only two values this is the more familiar square root). 15

Geometric Mean A common example of when the geometric mean is the correct choice average is when averaging growth rates. Another Method: Take log of each value, find arithmetic mean and anti-log the result. Exp( (log(0.15) + … + log(1.66) )/40) = 0.467

Serum Triglyceride Levels Count Mean=0.506 Median=0.460 Geometric Mean=0.467 Serum Triglyceride Levels from Cord Blood of 282 Babies 17

Why measures of variability are important Production of Aspirin New production process of 100 mg tabs Random sample from process – mgs - mean 99 mg Random sample from old process – mgs - mean 99 mg Same means but new is better because less variable 18

Definition of Range The range of a sample of values is the largest value minus the smallest value. New process the range is =5 Old process the range is =22 Range is simple ….. BUT – Only uses min and max – Gets larger as sample size increases 19

Definition of Inter-quartile Range The inter-quartile range of a sample of values is the difference between the upper and lower quartiles. The lower quartile is the value which is greater than ¼ of the sample and less than ¾ of the sample. Conversely, the upper quartile is the value which is greater than ¾ of the sample and less than ¼ of the sample. 20

Ordered Blood Glucose Values /4 of 40 = 10 3/4 of 40 = 30 21

Blood glucose (mmol/litre) Count Inter-Quartile Range Lower quartile Upper quartile Inter-quartile range 22

Standard deviation Neither measure uses the numerical values - only relative magnitudes A measure accounting for the values is the standard deviation Consider the aspirin data from the new process (mean 99 mg) Determine deviations from mean Square, add, average and square-root 23

Measures of scatter/dispersion – ‘how variable the data are’ Range – smallest to biggest value – increases with sample size Standard deviation – measure of variation around the mean – affected by skewness and outliers Variance = square of standard deviation Interquartile range (IQR) – from 25th centile to 75th centile 24

Plotting Data Histograms Stem and Leaf Plots Box Plots Stem Leaf Multiply Stem.Leaf by 10** Blood glucose (mmol/litre) 25

Mean and standard deviation Best description if distribution reasonably symmetric (and single mode) Give full description if data have Normal distribution 26

Mean 3, s.d. 1 Mean 5, s.d. 1 Mean 5, s.d. 2 27

Properties of Normal distribution Symmetric distribution – mean, median and mode equal Completely specified by mean and standard deviation 95% of distribution contained within mean  1.96 standard deviations 68% within mean  1 standard deviation 28

Continuous data, not Normally distributed If symmetric use mean and standard deviation If skewed use median and IQR Unless Positively skewed, but log transformation creates symmetric distribution – use geometric mean 29

Nominal categorical data Mode. % in each category, especially when binary. Wheeze in last 12 months Frequency (n)% No Yes Total

Ordinal categorical data Median and IQR if enough separate values. Otherwise as for nominal. 31

Discrete quantitative data As for continuous data if many values, as for ordinal data if fewer.

33 Difference Between Standard Deviation & Standard Error

34 Measure of Variability of the Sample Mean Range, inter-quartile range and standard deviation relate to population (sample) not mean. To understand the difference carry out a sampling experiment using the Ritchie Index values

35 Values of the Ritchie Index (Measure of Joint Stiffness) in 50 Untreated Patients Mean = (14+…+21)/50 = 12.18

Values of the Ritchie Index Arithmetic Mean - outlier prone Median - only uses relative magnitudes Mode - not necessarily central (categgorical data) Location = Central Tendency

37 Sampling Experiment Take a random sample (10) from the 50 values Determine the mean of the 10 values Repeat 50 times These means show variation - HOW LARGE IS IT ?

38 Variations in Samples Values of the Ritchie Index Values of the Ritchie Index Values of the Ritchie Index Values of the Ritchie Index Values of the Ritchie Index Mean=12.18 Mean=10.00 Mean=12.60 Mean=13.40 Mean=11.50

39 Ritchie Values Values of the Ritchie Index Original values (mean ; sd )

40 Ritchie Values Sampling Experiment – Sample Means Values of the Ritchie Index Sample means (mean ; sd ) Original values (mean ; sd )

41 Definition of the Standard Error The standard deviation of the sampling distribution of the mean is called the standard error of the mean.

42 Increasing Sample Size Increased precision (smaller standard error) Less skewness Values of the Ritchie Index Sample means (mean ; sd ) Values of the Ritchie Index Sample means (mean ; sd ) n=10 n=15

43 Standard error of the mean as a function of the sample size Sample Size Standard Error of the Mean

44 Population of Gene Lengths n=20, Gene Length (# of nucleotides) Frequency

45 Samples of size : n= Gene Length (# of nucleotides) Frequency

46 Practical Confusion A mean is often reported in medical papers as  1.37 what is 1.37 ? sd or se ?

Thanks! Tea break