Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert.

Similar presentations


Presentation on theme: "Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert."— Presentation transcript:

1 Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert Einstein College of Medicine, NY October 25, 2011

2 Definition Statistics is a science of variation. – Involves: collection, classification, analysis, and interpretation – Biostatistics is a segment of statistics that deals with data arising from biological sciences especially medicine and population based experiments

3 Variables IndependentDependent Scales Nominal E.g. Sex Race Study Group Ordinal E.g. Severity of disease Attitude Birth order Continuous E.g. Height, Age Forced expiratory volume

4 Measurement scales Nominal : Numbers or text representing unordered categories (e.g., 0=male, 1=female) Ordinal : Numbers or text representing categories where order counts (e.g., Grade of cancer 1= Grade I, 2= Grade II, Grade III) Continuous: Numerical data where any conceivable value is, in theory, attainable (e.g., height, weight, FEV etc.)

5 Summary of Data Two distinct step in processing the data – to describe the sample by means of descriptive statistics – to infer that the results observed can be generalized to other samples or population (inferential statistic) Descriptive measures: – Nominal/ordinal: Frequencies, Percentages, Proportions – Continuous: Measures of location, Measure of Spread

6 Descriptive Statistics Graphical and numerical approaches to summarizing data Measure of location -Arithmetic mean -Median -Mode

7 Measures of location Arithmetic mean: most frequently used measure of location The mean is calculated by summing all the observations in a set of data and dividing by the total number of observations

8 Example: Mean Example: Listed are the initial measurements of forced expiratory volume in 1 second for the 13 subjects involved in the study. Subject 12345678910111213 FEV 1 (liters) 2.32.153.52.62.752.824.052.252.683.04.022.853.38 > fev<-c( 2.30, 2.15, 3.50,2.60, 2.75, 2.82, 4.05, 2.25, 2.68,3.00, 4.02, 2.85, 3.38) > mean(fev)

9 Measures of location Median: defined as the 50 th percentile of set of measurements – a list of observations is ranked from the smallest to the largest, then half the values would be greater than or equal to the median, whereas the other half would be less than or equal to it If n is even, the median is the average of two middle most values.

10 Example: Median Example: Listed are the initial measurements of forced expiratory volume in 1 second for the 13 subjects involved in the study. Arrange the data in ascending order 2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.02, 4.05 Find the [ (n+1)/2]th value i.e. [ (13+1)/2]th= 7 value. 2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.02, 4.05 Subject 12345678910111213 FEV 1 (liters) 2.32.153.52.62.752.824.052.252.683.04.022.853.38 >median(fev) >fev1<-fev[1:12] >sort(fev1) >median(fev1)

11 Measures of location Mode: used as a summary measure of all types of data. The mode of set of values is the observation that occurs most frequently. Mostly used for nominal scale variables.

12 Measures of Dispersion or Spread Most common measures of spread (variability) of the data are – Variance – Standard deviation – Range – Interquartile range

13 Measures of Spread Variance: is the average of the square deviations of the observations from the mean Standard deviation: is given by the square root of the variance. It is attractive, because it is expressed in the same units as the mean > var(fev) > sd(fev) > sqrt(var(fev))

14 Measures of Spread Range : The range is defined as the difference between the largest and the smallest observations, also can be represented by (minimum, max) – FEV data: Range=4.05-2.15 = 1.90 liters Interquartile range: is calculated by subtracting the 25 th percentile data from the 75 th percentile data; it encompasses the middle 50% of the data. – 25 th percentile = [(n+1)/4]th value – 75 th percentile = [3(n+1)/4]th value – FEV data: Interquartile range = 3.38 – 2.60=0.78 liters  range(fev)  Stem(fev) > summary (fev) > boxplot(fev) > boxplot(fev,main="FEV data", horizontal=TRUE)

15 Standard deviation Vs. Standard Error Standard Error: the standard deviation of the distribution of sample means is known as standard error. is calculated by dividing the standard deviation of the sample by square root of the number of observations.

16 Guidelines for reporting descriptive statistics  Report all numbers with the appropriate degree of precision (two digits)  When reporting percentages, always give the numerators and denominators of the calculation.  Specify the denominators of rates, ratios, proportions and percentages  Provide appropriate measures of central tendency and dispersion  Approximately normally distributed data-mean, SD  Other distributions-median, range, Interquartile range  Do NOT summarize continuous data with the mean and the standard error of the mean  Avoid using percentages to summarize small samples Lang TA and Secic M, 2006


Download ppt "Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert."

Similar presentations


Ads by Google