Biostatistics College of Medicine University of Malawi 2011
Definition of Statistics Statistics is the science that studies the collection and interpretation of numerical data. Field of statistics is divided into Mathematical and Applied statistics. 2011
Applied statistics concerns the application of the methods of mathematical statistics to specific areas such as public health, economics etc Biostatistics is the branch of applied statistics that concerns the application of statistical methods to medical and biological problems 2011
Importance of Biostatistics To better understand reports of research studies in your field To obtain a foundation in statistical issues for designing and conducting your own research. To learn a few techniques for analyzing data 2011
Variables, Measurement Scales and Summarizing data 2011
Definition of a Variable A variable is a characteristic that can have many different values. 2011
Types of Variables A categorical variable has values with interruptions or gaps between them (categories) A continuous variable has values that could, theoretically, be measured without gaps 2011
Scales of Measurement Qualitative Quantitative Nominal Ordinal Binary Discrete Continuous 2011
Qualitative Consists of a finite set of possible values or categories Always categorical 2011
Nominal scale Diagnosis Ethnic group Qualitative categories which have no particular order. Examples: Diagnosis Malaria AIDS Accident Other Ethnic group Chewa Yao Tumbuka 2011
Binary scale Two qualitative categories Examples: Gender HIV status Male Female HIV status Uninfected Infected 2011
Ordinal Scale Qualitative categories that have a natural order. As a result, we can decide that one outcome is "less-than" or "more-than" another. 2011
Examples: View of statement that condoms can help prevent spread of AIDS Agree No opinion Disagree Severity of diarrhea None Mild Moderate Severe 2011
Quantitative Numerical measurements or counts May be categorical or continuous Discrete Continuous 2011
Discrete scale There are a limited number of distinct possible values or group of values Examples: Number of children in a family Number of decayed, missing or filled teeth 2011
Continuous Numeric scale with infinitely many values between any two observed values. Examples: Weight in kg Age 2011
Summary Qualitative Quantitative Categorical Nominal Ordinal Binary Discrete Continuous 2011
Understanding the scale of measurement of data is key to knowing the correct methods for summarizing data graphical display of data analysis of data. 2011
Frequency Distributions: summarizing data with counts Tabular Displays Graphical Displays 2011
Frequency Distribution of Sex Sex Frequency Percent _______________________________ F 675 54.35 M 567 45.65 Total 1242 100.00 2011
Frequency Distribution Frequency is the count of occurrence of each value Relative frequency is: 2011
Relative frequency may be expressed as a Proportion Percent 2011
Frequency Distribution of Education of Adults Education Frequency Percent None 251 43.1 Jr. Primary 203 34.9 Sr. Primary 87 15.0 Secondary + 41 7.0 Total 582 100.0 2011
NB: Re number of significant figures Don’t use more significant figures for estimates than number of digits in sample size Example n=387 report to 3 sig. figures n=17 report to 2 sig. figures 2011
Data cleaning: Frequency distribution of occupation (partial) Occupation Frequency Percent BUSINESS 1 0.08 FARMING 4 0.32 assistant driver 1 0.08 business 7 0.56 cane cutting 1 0.08 carpenter 1 0.08 court messenger 1 0.08 cowboy 5 0.40 domestic work 1 0.08 dulder 2 0.16 employee 1 0.08 employee (forest guard) 1 0.08 f/guard 1 0.08 farmer 63 5.07 farming 462 37.20 2011
Frequency Distribution of Age Age Frequency Percent __________________________________ 0-4 219 17.6 5-9 200 16.1 10-14 175 14.1 15-19 95 7.7 20-29 188 15.1 30-39 127 10.2 40-49 85 6.8 50+ 153 12.3 Total 1242 100.0 2011
Bar Chart Method of displaying a frequency distribution or percentage distribution Length of bars proportional to the frequency of the category 2011
Frequency distribution 2011
2011
Pie Chart Another way of presenting frequencies. The portion of the pie (in terms of degrees) is determined by computing 360´(relative frequency) for each category. Computer graphics packages very useful. 2011
2011
Histogram Special case of a bar chart where both axes are numerically meaningful height of bars proportional to frequency of category width of bars proportional to width of the class interval bars adjoin true class limits important in constructing and labeling The areas of the bars relative to each other give a visual impression of the frequency distribution 2011
Frequency Distribution of Ages of a small sample Relative 11-20 3 12% 21-30 1 4% 31-40 12 48% 41-50 2 8% 51-60 6 24% More 2011
Histogram of Age 2011
Histogram of Age of Chikwawa Residents 2011
Stem-and-Leaf Plots a quick and easy way to organize data to give a visual impression similar to a histogram while retaining much more detail from the data. also a great way to sort data. 2011
Each value is retained in the stem-and-leaf: Figure 16: Stem and Leaf and Box Plot of Age of ICU Patients Stem Leaf # 8 45 2 7 145566789 9 6 5 1 5 223366 6 4 3 149 3 2 9 1 1 599 3 ----+----+----+----+ Each value is retained in the stem-and-leaf: in the lowest row, 1 599 represents 3 ages 15, 19, 19, above, 3 149 represents ages 31, 34, 39 and so on. 2011
Cumulative Frequency: Count of the individuals in each category and the ones lower. Count of individuals with values up to the end of each category. 2011
Cumulative Relative Frequency Relative frequency or percent of the individuals in each category and the ones lower. Relative frequency or percent of individuals with values up to the end of each category. Cumulative relative frequencies are used to find percentiles: a given percentile, p, is the value of a variable that divides the distribution such that p% are at that value or lower 2011
Frequency Distribution of Ages of Chikwawa Residents Cumulative Cumulative Age Frequency Percent Frequency Percent 0-4 219 17.63 219 17.63 5-9 200 16.10 419 33.74 10-14 175 14.09 594 47.83 14-19 95 7.65 689 55.48 20-29 188 15.14 877 70.61 30-39 127 10.23 1004 80.84 40-49 85 6.84 1089 87.68 50+ 153 12.32 1242 100.00 2011
Cumulative Frequency Polygon of Age 2011
Percentile The term percentile is the value of a variable, below or equal to which a specific percentage of the values falls. 2011
For example 50th percentile = the value that cuts the distribution of values so that 50% are below or equal to it 25th percentile = lower quartile 75th percentile = upper quartile 2011
Cumulative Relative Frequencies can be used to define percentiles. In our example, age 20 is the 55th percentile P55 = 20. A special use of the cumulative percentage polygon is for estimating percentiles of the sample. 2011
Cumulative Frequency Polygon of Age 2011
Summary: You can do a lot with counts Be aware of the type of variable Convert counts to proportions Proportions can be converted to percentages or other “rates” Tables and graphs must be carefully constructed and clearly labeled 2011