INTRODUCTION TO BIOSTATISTICS DR.S.Shaffi Ahamed Asst. Professor Dept. of Family and Comm. Medicine KKUH
This session covers: Background and need to know Biostatistics Definition of Statistics and Biostatistics Types of data Graphical representation of a data Frequency distribution of a data
Basis
Dynamic nature of the U n i v e r s e the very continuous change in Nature brings - uncertainty and - variability in each and every sphere of the Universe
We by no mean can control or over-power the factor of uncertainty but capable of measuring it in terms of Probability
Sources of Medical Uncertainties Natural variation due to biological, environmental and sampling factors Natural variation among methods, observers, instruments etc. Errors in measurement or assessment or errors in knowledge Incomplete knowledge
Biostatistics is the science which helps in managing health care uncertainties
“Statistics is the science which deals with collection, classification and tabulation of numerical facts as the basis for explanation, description and comparison of phenomenon”. ------ Lovitt
“BIOSTATISICS” (1) Statistics arising out of biological sciences, particularly from the fields of Medicine and public health. (2) The methods used in dealing with statistics in the fields of medicine, biology and public health for planning, conducting and analyzing data which arise in investigations of these branches.
Reasons to know about biostatistics: Medicine is becoming increasingly quantitative. The planning, conduct and interpretation of much of medical research are becoming increasingly reliant on the statistical methodology. Statistics pervades the medical literature.
CLINICAL MEDICINE Documentation of medical history of diseases. Planning and conduct of clinical studies. Evaluating the merits of different procedures. In providing methods for definition of “normal” and “abnormal”.
PREVENTIVE MEDICINE To provide the magnitude of any health problem in the community. To find out the basic factors underlying the ill-health. To evaluate the health programs which was introduced in the community (success/failure). To introduce and promote health legislation.
BASIC CONCEPTS Data : Set of values of one or more variables recorded on one or more observational units Sources of data 1. Routinely kept records 2. Surveys (census) 3. Experiments 4. External source Categories of data 1. Primary data: observation, questionnaire, record form, interviews, survey, 2. Secondary data: census, medical record,registry
TYPES OF DATA QUALITATIVE DATA DISCRETE QUANTITATIVE CONTINOUS QUANTITATIVE
QUALITATIVE Nominal Example: Sex ( M, F) Exam result (P, F) Blood Group (A,B, O or AB) Color of Eyes (blue, green, brown, black)
ORDINAL Example: Response to treatment (poor, fair, good) Severity of disease (mild, moderate, severe) Income status (low, middle, high)
QUANTITATIVE (DISCRETE) Example: The no. of family members The no. of heart beats The no. of admissions in a day QUANTITATIVE (CONTINOUS) Example: Height, Weight, Age, BP, Serum Cholesterol and BMI
Discrete data -- Gaps between possible values Number of Children Continuous data -- Theoretically, no gaps between possible values Hb
Scale of measurement Qualitative variable: A categorical variable Nominal (classificatory) scale - gender, marital status, race Ordinal (ranking) scale - severity scale, good/better/best
Scale of measurement Quantitative variable: A numerical variable: discrete; continuous Interval scale : Data is placed in meaningful intervals and order. The unit of measurement are arbitrary. - Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and No implication of ratio (30º C is not twice as hot as 15º C)
Ratio scale: Data is presented in frequency distribution in logical order. A meaningful ratio exists. - Age, weight, height, pulse rate - pulse rate of 120 is twice as fast as 60 - person with weight of 80kg is twice as heavy as the one with weight of 40 kg.
Scales of Measure Nominal – qualitative classification of equal value: gender, race, color, city Ordinal - qualitative classification which can be rank ordered: socioeconomic status of families Interval - Numerical or quantitative data: can be rank ordered and sizes compared : temperature Ratio - Quantitative interval data along with ratio: time, age. Nominal variables allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. For example, all we can say is that 2 individuals are different in terms of variable A (e.g., they are of different race), but we cannot say which one "has more" of the quality represented by the variable. Typical examples of nominal variables are gender, race, color, city, etc. Ordinal variables allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more." A typical example of an ordinal variable is the socioeconomic status of families. For example, we know that upper-middle is higher than middle but we cannot say that it is, for example, 18% higher. Also this very distinction between nominal, ordinal, and interval scales itself represents a good example of an ordinal variable. For example, we can say that nominal measurement provides less information than ordinal measurement, but we cannot say "how much less" or how this difference compares to the difference between ordinal and interval scales. Interval variables allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees, and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees. Ratio variables are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not only can we say that a temperature of 200 degrees is higher than one of 100 degrees, we can correctly state that it is twice as high. Interval scales do not have the ratio property. Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales.
CONTINUOUS DATA QUALITATIVE DATA wt. (in Kg.) : under wt, normal & over wt. Ht. (in cm.): short, medium & tall
Table 1 Distribution of blunt injured patients according to hospital length of stay
CLINIMETRICS A science called clinimetrics in which qualities are converted to meaningful quantities by using the scoring system. Examples: (1) Apgar score based on appearance, pulse, grimace, activity and respiration is used for neonatal prognosis. (2) Smoking Index: no. of cigarettes, duration, filter or not, whether pipe, cigar etc., (3) APACHE( Acute Physiology and Chronic Health Evaluation) score: to quantify the severity of condition of a patient
INVESTIGATION
Frequency Distributions “A Picture is Worth a Thousand Words”
Frequency Distributions What is a frequency distribution? A frequency distribution is an organization of raw data in tabular form, using classes (or intervals) and frequencies. What is a frequency count? The frequency or the frequency count for a data value is the number of times the value occurs in the data set.
Frequency Distributions data distribution – pattern of variability. the center of a distribution the ranges the shapes simple frequency distributions grouped & ungrouped frequency distributions
Categorical or Qualitative Frequency Distributions What is a categorical frequency distribution? A categorical frequency distribution represents data that can be placed in specific categories, such as gender, blood group, & hair color, etc.
Categorical or Qualitative Frequency Distributions -- Example Example: The blood types of 25 blood donors are given below. Summarize the data using a frequency distribution. AB B A O B O B O A O B O B B B A O AB AB O A B AB O A
Categorical Frequency Distribution for the Blood Types -- Example Continued Note: The classes for the distribution are the blood types.
Quantitative Frequency Distributions -- Ungrouped What is an ungrouped frequency distribution? An ungrouped frequency distribution simply lists the data values with the corresponding frequency counts with which each value occurs.
Quantitative Frequency Distributions – Ungrouped -- Example Example: The at-rest pulse rate for 16 athletes at a meet were 57, 57, 56, 57, 58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60, and 58. Summarize the information with an ungrouped frequency distribution.
Quantitative Frequency Distributions – Ungrouped -- Example Continued Note: The (ungrouped) classes are the observed values themselves.
Example of a simple frequency distribution (ungrouped) 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 f 9 3 8 2 7 2 6 1 5 4 4 4 3 3 2 3 1 3 f = 25
Relative Frequency Distribution Proportion of the total N Divide the frequency of each score by N Rel. f = f/N Sum of relative frequencies should equal 1.0 Gives us a frame of reference
Relative Frequency Example: The relative frequency for the ungrouped class of 57 will be 4/16 = 0.25.
Relative Frequency Distribution Note: The relative frequency for a class is obtained by computing f/n.
Example of a simple frequency distribution 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 f rel f 9 3 .12 8 2 .08 7 2 .08 6 1 .04 5 4 .16 4 4 .16 3 3 .12 2 3 .12 1 3 .12 f = 25 rel f = 1.0
Cumulative Frequency and Cumulative Relative Frequency NOTE: Sometimes frequency distributions are displayed with cumulative frequencies and cumulative relative frequencies as well.
Cumulative Frequency and Cumulative Relative Frequency What is a cumulative frequency for a class? The cumulative frequency for a specific class in a frequency table is the sum of the frequencies for all values at or below the given class.
Cumulative Frequency and Cumulative Relative Frequency What is a cumulative relative frequency for a class? The cumulative relative frequency for a specific class in a frequency table is the sum of the relative frequencies for all values at or below the given class.
Cumulative Frequency and Cumulative Relative Frequency Note: Table with relative and cumulative relative frequencies.
Example of a simple frequency distribution (ungrouped) 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1 f cf rel f rel. cf 9 3 3 .12 .12 8 2 5 .08 .20 7 2 7 .08 .28 6 1 8 .04 .32 5 4 12 .16 .48 4 4 16 .16 .64 3 3 19 .12 .76 2 3 22 .12 .88 1 3 25 .12 1.0 f = 25 rel f = 1.0
Quantitative Frequency Distributions -- Grouped What is a grouped frequency distribution? A grouped frequency distribution is obtained by constructing classes (or intervals) for the data, and then listing the corresponding number of values (frequency counts) in each interval.
Tabulate the hemoglobin values of 30 adult male patients listed below Patient No Hb (g/dl) 1 12.0 11 11.2 21 14.9 2 11.9 12 13.6 22 12.2 3 11.5 13 10.8 23 4 14.2 14 12.3 24 11.4 5 15 25 10.7 6 13.0 16 15.7 26 12.5 7 10.5 17 12.6 27 11.8 8 12.8 18 9.1 28 15.1 9 13.2 19 12.9 29 13.4 10 20 14.6 30 13.1
Steps for making a table Step1 Find Minimum (9.1) & Maximum (15.7) Step2 Calculate difference 15.7 – 9.1 = 6.6 Step3 Decide the number and width of the classes (7 c.l) 9.0 -9.9, 10.0-10.9,---- Step4 Prepare dummy table – Hb (g/dl), Tally mark, No. patients
DUMMY TABLE Tall Marks TABLE llll ll Hb (g/dl) Hb (g/dl) Tall marks DUMMY TABLE Tall Marks TABLE Hb (g/dl) Tall marks No. patients 9.0 – 9.9 10.0 – 10.9 11.0 – 11.9 12.0 – 12.9 13.0 – 13.9 14.0 – 14.9 15.0 – 15.9 Total Hb (g/dl) Tall marks No. patients 9.0 – 9.9 10.0 – 10.9 11.0 – 11.9 12.0 – 12.9 13.0 – 13.9 14.0 – 14.9 15.0 – 15.9 l lll llll 1 llll llll llll ll 1 3 6 10 5 2 Total - 30
Table Frequency distribution of 30 adult male patients by Hb Hb (g/dl) No. of patients 9.0 – 9.9 10.0 – 10.9 11.0 – 11.9 12.0 – 12.9 13.0 – 13.9 14.0 – 14.9 15.0 – 15.9 1 3 6 10 5 2 Total 30
Table Frequency distribution of adult patients by Hb and gender: Hb (g/dl) Gender Total Male Female <9.0 9.0 – 9.9 10.0 – 10.9 11.0 – 11.9 12.0 – 12.9 13.0 – 13.9 14.0 – 14.9 15.0 – 15.9 1 3 6 10 5 2 8 4 14 16 9 30 60
Elements of a Table Ideal table should have Number Title Column headings Foot-notes Number – Table number for identification in a report Title,place - Describe the body of the table, variables, Time period (What, how classified, where and when) Column - Variable name, No. , Percentages (%), etc., Heading Foot-note(s) - to describe some column/row headings, special cells, source, etc.,
Table II. Distribution of 120 (Madras) Corporation divisions according to annual death rate based on registered deaths in 1975 and 1976 Figures in parentheses indicate percentages
DIAGRAMS/GRAPHS Discrete data --- Bar charts (one or two groups) Continuous data --- Histogram --- Frequency polygon (curve) --- Stem-and –leaf plot --- Box-and-whisker plot
Example data 68 63 42 27 30 36 28 32 79 27 22 28 24 25 44 65 43 25 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38 42 27 31 50 38 21 16 24 64 47 23 22 43 27 49 28 23 19 11 52 46 31 30 43 49 12
Histogram Figure 1 Histogram of ages of 60 subjects
Polygon
Example data 68 63 42 27 30 36 28 32 79 27 22 28 24 25 44 65 43 25 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38 42 27 31 50 38 21 16 24 64 47 23 22 43 27 49 28 23 19 11 52 46 31 30 43 49 12
Stem and leaf plot Stem-and-leaf of Age N = 60 Leaf Unit = 1.0 6 1 122269 19 2 1223344555777788888 11 3 00111226688 13 4 2223334567999 5 5 01127 4 6 3458 2 7 49
Box plot
Descriptive statistics report: Boxplot - minimum score maximum score lower quartile upper quartile median - mean the skew of the distribution: positive skew: mean > median & high-score whisker is longer negative skew: mean < median & low-score whisker is longer
The prevalence of different degree of Hypertension Pie Chart Circular diagram – total -100% Divided into segments each representing a category Decide adjacent category The amount for each category is proportional to slice of the pie The prevalence of different degree of Hypertension in the population
Bar Graphs Heights of the bar indicates frequency Frequency in the Y axis and categories of variable in the X axis The bars should be of equal width and no touching the other bars The distribution of risk factor among cases with Cardio vascular Diseases
HIV cases enrolment in USA by gender Bar chart This shows relative trends in admissions by gender.
HIV cases Enrollment in USA by gender Stocked bar chart This emphasizes the constancy of the overall admissions and shows the trends subtly.
Graphic Presentation of Data the frequency polygon (quantitative data) the histogram (quantitative data) the bar graph (qualitative data)
General rules for designing graphs A graph should have a self-explanatory legend A graph should help reader to understand data Axis labeled, units of measurement indicated Scales important. Start with zero (otherwise // break) Avoid graphs with three-dimensional impression, it may be misleading (reader visualize less easily
Any Questions