Introduction to Bio-Medical statistics Vikas Dhikav, Department of Neurology, Dr. RML Hospital & PGIMER, New Delhi
Statistics Collection, Interpretation & Analysis (CIA) of data
“Facts are many, but truth is one” - Rabindranath Tagore
“There are no fundamental differences between Eastern and Western minds” JC Bose, 1917, “father of radio science”
Data type
Discrete verses continuous data Whole number Continuous Any value
Types Nominal Ordinal Interval Ratio Yes/no good, better, best What’s there in name? Ordinal good, better, best Interval Differences between two variable is meaningful Ratio Relation between data like 10 is 2 times of 5
Ordinal scale
Interval scale Difference between data is meaningful Time Temperature
Ratio scale Perfect type of data Measures ratio between two quantities
Measure it! “if you can not describe a phenomena in quantitative terms, your knowledge is unsatisfactory and meager” -Lord Kelvin
Population & sample Also called study universea small group is called sample
Data Numbers of measurements to be donedata Variablescharacters or attributes
Describe data Done before analysis Tabulation or plotting Stem and leaf plot Histogram Bar chart Dot plot Scatter plot
Stem and leaf plot Useful in knowing shape of distribution Retains original data Also describes outliers Tells about mode
Histogram (histo=upright) Graphical distribution of tabulated frequencies Continuous data Numbers get converted into ranges
Bar chart >200 years old Used for caregorical data One axis shows categories, another shows values Could be horizontal or vertical Spaces between valuesbar, no spacehistogram
Dot plot Dots plotted on a simple scale Used for moderate size data Described 100 years back as a handrawn graph Used for moderate size data Mostly for continuous data Clusters, gaps and outliers can be highlighted
Dot plot example
Box and wisker plot Depicts numerical data via quartiles Tells about range, Median, skewness and outliers Data distribution can be known
Example The oldest person in Neurology OPD of Dr. RML hospital is 90 years. The youngest person is 15. The median age is 44, while the lower quartile is 25, and the upper quartile is 67.
Tests of normalty Its important to know data distribution right from beginning In descriptive data Goodness of fit Inferential statstics Null hypothesis
What is normalcy?
Tests… Descriptive statistics Inferential Histogram Box & Whisker plot Kurtosis Inferential Kolmogrov-Smnirov test
Rules of thumb for normality When n is greater than 30, this is a good approximation to results from more sensitive tests. If the standard deviation is more than 3 points from mean, question normality of data
Methods of dispersion Range Standard deviation (S.D.) Coefficient of variation (C.V.)
Methods.. Range High-low values
Calculate standard deviations from the range Standard deviation is 1/4th of range. (Maximum – Minimum)/4
Example Number of hours resident doctors duties in RMLH (n=10) Mean=17 12, 12, 14, 15, 16, 18, 18, 20, 20, 25. Mean=17 Standard deviation =4.1. Range=25 – 12 = 13 13/4 = 3.25.
Variance Squared value of standard deviation (s2) Limited value Standard deviation used more commonly
Standard deviation Most widely used measure of dispersion Square root of mean Depicts average deviation from mean Expressed in same units as data
Standard deviation… A low standard indicates data is close to its mean, while high SD indicates data is far from mean
Chebyshev rule 3/4 of data with in 2 Standard deviations
Standard error (SE) Standard deviation tells about variability of data and hence spread 95% of observations lie with in 2 standard deviations Standard error tells about how different is our sample from population.
SE… Standard error accuracy with which data represents population. Measures the accuracy with which sample mean deviates from actual mean Smaller the SE; better estimates Larger sample size-smaller SE
Pearson index of skewness Subtract mean from median Multiply by 3 And divide by standard deviation Value ranges from -1 to +1 <-1 means left skewness >+1 means right skewness
Coefficient of variation CV=SD/mean x 100 Tells about percentage of deviation from mean Can help in comparing different datasets
Thank you!