Uses of Biostatistics in Epidemiology (1) Amornrath Podhipak, Ph.D. Department of Epidemiology Faculty of Public Health Mahidol University 2006
Why Statistics ?? Why Computers ?? Why Software ?? Medical doctors and public health personnel A tools for calculation
Why do we need “statistics” in medicine and public health? (particularly, epidemiology??) *Medicine is becoming increasingly quantitative in describing a condition. Most of malaria patients are infected with P.falciparum. 82.5% got P.falciparum. Those patients looks pale.Haemoglobin level was 9.89 mg%, on average. Epidemiology concerns with describing disease pattern in a group of people. Descriptive statistics give a clearer picture of what we want to describe. * The answer to a research question need to be more definite. Is the new treatment better: how much better?, in what aspect?, any evidence? could it be a real difference? Inferential statistics give an answer in the world of uncertainty.
Measurement of characteristics (Variables vs Constant) 4 scales of measurement Qualitative variables - Nominal scale (group classification only) - Ordinal scale (classification with ordering / ranking) Quantitative variables - Interval (magnitude + constant distance between points) - Ratio (magnitude + constant distance between points + true zero) Before using statistics, we need some kinds of measurements, in order to get more detailed information.
Weght? 80 kg Height? 160 cm Handsom e? Intelligent? Income? 100,000 Married? BP? 140/90 HIV?
Female Male 1 2 Nominal scale Values have no meaning. Ordinal scale 1 23 Equal distance between points does not reflect equal interval value.
Interval scale i.e. degree celcius Ratio scale i.e. weight 0 Freezing point was supposed to be zero degree celcius Not the true ZERO temperature (no heat ) True ZERO (nothing here) Equal distance between points means equal interval value.
Questionnaire (TB and Passive smoking) Sex [ ] Male [ ] Female Education [ ] 1-6 yr [ ] 7-9 yr [ ] 9+ yr Family income ……………………. Baht/m Passive Smoking ……... Result from tuberculin test ……………………. mm X-ray [ ] +ve [ ] -ve Weight …………. kg,Height ………………….. cm Record form
Variable (characteristic being measured)Result of measurementType Marital statussingle/married/divorcednominal gendermale/femalenominal smokingyes/nonominal smokingnonsmoker/ light smoker/ ordinal moderate smoker/ heavy smoker smokingnumber of cig/dayratio feeling of painyes/nonominal feeling of painnone/light/moderate/highordinal feeling of pain > 10ordinal attitude toward strongly agree/ agree/ordinal selective abortion not sure/ disagree/ strongly disagree blood pressuremmHgratio temperaturedegree celciusinterval weight gramratio tumor stageI, II, III, IVordinal
Quantitative (numeric, metric) variables are classified as continuousIt can take all values in an interval e.g. weight, temperature, etc. discreteIt can take only certain values (often integer value) e.g. parity, number of sex partners, etc. Continuous data can be categorised into groups, which one needs to define “upper boundary” and “lower boundary” of a value (or a class) boundaries: 120.5, 121.5, 122.5, 123.5, … boundaries: , , , , … boundaries: , , , , …
Descriptive statistics - a way to summarize a dataset (a group of measurement) Example:Height of 100 children, years of age What are values that best describe the height of these 100 persons?
1)Rearrange the data: Minimum, Maximum, Range, Median, Mode 123, 165, 42, 139, 140 Max-Min, Value in the middle, Most repeated value
3) Present in a graph (Histogram) Frequency Height (cm)
Methods of data presentation 1. Table 2. Graph - line graph - bar chart - pie chart
- scatter plot - area graph - error bar - histogram
Another set of value for describing a dataset is the MEAN and STANDARD DEVIATION. Mean indicates the location. Standard deviation indicates the scatterness of data (roughly). Example: Dataset 1: Age of 6 children Mean = 4.0 years sd = 0 y (no variation) Example: Dataset 2: Age of 6 children Mean = 4.0 years sd = 1.79 y(with variation) or, another example: The average body height of these children was cm. with standard deviation of 8.9 cm. The average body height of these children was cm. with standard deviation of 0.2 cm.
If we categorize the data into qualitative (tall/short) the proportion would then be calculated. Descriptive statistics (proportion and/or percentage) Most of the children were less than 150 cm. tall. 85% of them had height less than 152 cm.
A final note on defining a variable and a measurement: Important things to consider before making any measurement: 1.Do we measure the right thing? Fatty food and CVD 2.What is the tool that can actually measure what we want to measure? Morphology (measure) indicators % standard weight body mass index (wt/ht 2 ) tricep skinfold thickness Wt for age Wt for height etc. Food intake (ask)Protein calorie intake (ask & calculate) 3.How valid the instrument? Does the questionnaire actually get the fatty food intake information? (scope of questions, recall of subjects, certainty of reported amount of food, variability of ingredients, etc.) Does the information obtained actually reflect fatty food intake? 4.How precise the instrument? Does the information precisely estimate the amount of fatty food intake for each individual?
In summary: Statistics (and epidemiology) deals with a group (the bigger the group, the better the result) of persons (not one individual patient). We look for the characteristics which are most common in the group. Descriptive statistics is used for explaining our sample (or findings) i.e. Most of the patients were anemic. 80% of them had haemoglobin level less than 10 mg%. The average haemoglobin level was 9.5 mg% with standard deviation of 1.5 mg%. Inferential statistics (Infer to general population of interest)