Download presentation
Presentation is loading. Please wait.
1
Introduction to biostatistics
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
2
Before we start Final SMME I exam: Entry test in Bioethics
Entry test in Biostats ______________________ 1 case for statistical analysis and interpretation 1 bioethical case for comment and discussion 1 theory question from the bioethics questionnaire
3
Before we start
4
Outline Population vs sample Descriptive vs inferential statistics
Sampling methods Sample size calculation Level of measurement Graphical summaries
5
Why do we need to use statistical methods?
To make strongest possible conclusion from limited amounts of data; To generalize from a particular set of data to a more general conclusion. What do we need to pay attention to? Bias Probability
6
Definition of biostatistics
The science of collecting, organizing, analyzing, interpreting and presenting data for the purpose of more effective decisions in clinical context. “Turning data into knowledge” (Patrick Heagerty)
7
Population vs Sample Population includes all objects of interest whereas sample is only a portion of the population. Parameters are associated with populations and statistics with samples Parameters are usually denoted using Greek letters (μ, σ) while statistics are usually denoted using Roman letters (X, s) There are several reasons why we do not work with populations. They are usually large, and it is often impossible to get data for every object we're studying Sampling does not usually occur without cost, and the more items surveyed, the larger the cost
8
Descriptive vs Inferential statistics
We compute statistics, and use them to estimate parameters. The computation is the first part of the statistical analysis (Descriptive Statistics) and the estimation is the second part (Inferential Statistics). Descriptive Statistics The procedure used to organize and summarize masses of data Inferential Statistics The methods used to find out something about a population, based on a sample
9
Descriptive vs Inferential statistics
Population Parameters Sampling From population to sample Sample Statistics From sample to population Inferential statistics
10
Probability A measure of the likelihood that a particular event will happen. It is expressed by a value between 0 and 1. First, note that we talk about the probability of an event, but what we measure is the rate in a group. If we observe that 5 babies in every have congenital heart disease, we say that the probability of a (single) baby being affected is 5 in 1000 or 0.0 1.0 Cannot happen Sure to happen
11
Sampling Individuals in the population vary from one another with respect to an outcome of interest.
12
Sampling When a sample is drawn there is no certainty that it will be representative for the population. Sample A Sample B
13
Error Random error can be conceptualized as sampling variability.
Bias (systematic error) is a difference between an observed value and the true value due to all causes other than sampling variability. Accuracy is a general term denoting the absence of error of all kinds.
14
Sampling Sampling A specific principle used to select members of population to be included in the study. Due to the large size of target population, researchers have no choice but to study the a number of cases of elements within the population to represent the population and to reach conclusions about the population. Biased sample Biased sample is one in which the method used to create the sample results in samples that are systematically different from the population. Random sample In random sampling, each item or element of the population has an equal chance of being chosen at each draw.
15
Sampling Sample B Sample A Population
16
Sampling Sample B Sample A Population
17
Sampling Stages of sampling: Defining target population
Determining sampling size Selecting a sampling method Properties of a good sample: Random selection Representativeness by structure Representativeness by number of cases
18
Sampling Random sampling: Sample group members are selected in a random manner Highly effective if all subjects participate in data collection High level of sampling error when sample size is small Systematic: Including every Nth member of population in the study Time efficient Cost efficient High sampling bias if periodicity exists
19
Sampling Judgement: Sample group members are selected on the basis of judgement of researcher Time efficiency Samples are not highly representative Unscientific approach Personal bias Convenience: Obtaining participants conveniently with no requirements whatsoever High levels of simplicity and ease Usefulness in pilot studies Highest level of sampling error Selection bias
20
Sampling Snowball: Sample group members nominate additional members to participate in the study Possibility to recruit hidden population Over-representation of a particular network Reluctance of sample group members to nominate additional members
21
Sampling Stratified: Representation of specific subgroup or strata
Effective representation of all subgroups Precise estimates in cases of homogeneity or heterogeneity within strata Knowledge of strata membership is required Complex to apply in practical levels Cluster: Clusters of participants representing population are identified as sample group members Time and cost efficient Group-level information needs to be known Usually higher sampling errors compared to alternative sampling methods
22
Sample size calculation
Law of Large Numbers: As the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero. Application in biostatistics: Bigger sample size, smaller margin of error. A properly designed study will include a justification for the number of experimental units (people/animals) being examined. Sample size calculations are necessary to design experiments that are large enough to produce useful information and small enough to be practical.
23
Sample size calculation
Generally, the sample size for any study depends on: Acceptable level of confidence; Expected effect size and absolute error of precision; Underlying scatter in the population; Power of the study. High power Large sample size Large effect Little scatter Low power Small sample size Small effect Lots of scatter
24
Sample size calculation
For quantitative variables: Z – confidence level; SD – standard deviation; d – absolute error of precision.
25
Sample size calculation
For quantitative variables: A researcher is interested in knowing the average systolic blood pressure in pediatric age group at 95% level of confidence and precision of 5 mmHg. Standard deviation, based on previous studies, is 25 mmHg. => 97
26
Sample size calculation
For qualitative variables: Z – confidence level p – expected proportion in population d – absolute error of precision
27
Sample size calculation
For qualitative variables: A researcher is interested in knowing the proportion of diabetes patients having hypertension. According to a previous study, the actual number is no more than 15%. The researcher wants to calculate this size with a 5% absolute precision error and a 95% confidence level. => 196
28
When do you need biostatistics?
BEFORE you start your study! After that, it will be too late…
29
Planning Research programme: Aim Object Units of observation
Indices of observation Place Time Statistical analyses Methodology
30
Planning Aim The aim of the investigation is trying to summarize and formulate clearly the research hypothesis. Object Object of the investigation is the event, that is going to be studied. Units of observation Logical unit – each studied case Technical unit – the environment, where the logical units are situated Indices of observation – not too many, but important; measurable; additive and self controlling. Factorial Resultative
31
Planning Place Time Single – events are studied in a single moment of time, the so called “critical moment”. Continuous – used to characterize a long term tendency of the events Statistical analyses Methodology
32
One vs Many Many measurements on one subject are not the same thing as one measurement on many subjects. With many measurements on one subject, you get to know the one subject quite well but you learn nothing about how the response varies across subjects. With one measurement on many subjects, you learn less about each individual, but you get a good sense of how the response varies across subjects.
33
Paired vs Unpaired Data are paired when two or more measurements are made on the same observational unit (subjects, couples, and so on). Data are unpaired, where only one type of measurement is made on each unit.
34
Data processing Data check and correction Data coding Data aggregation
According to the data usage: Primary Secondary According to the number of indices Simple Complex It is always a good idea to summarize your data (at least for important variables) You become familiar with the data and the characteristics of the sample that you are studying You can also identify problems with data collection or errors in the data (data management issues) Range checks for illogical values
35
Variables vs Data Mr. Smith Mrs. Johns Mrs. Oliver Age 36 43 56 Sex
A variable is something whose value can vary. Data are the values you get when you measure a variable. Mr. Smith Mrs. Johns Mrs. Oliver Age 36 43 56 Sex Male Female Blood type A
36
Quantitative (metric) variables
Continuous Measured units Metric continuous variables can be properly measured and have units of measurement. Continuous values on proper numeric line or scale Data are real numbers (located on the number line). Discrete Integer values on proper numeric line or scale Metric discrete variables can be properly counted and have units of measurement – ‘numbers of things’. Counted units
37
Qualitative (categorical) variables
Nominal Values in arbitrary categories Ordering of the categories is completely arbitrary. In other words, categories cannot be ordered in any meaningful way. No units! Data do not have any units of measurement. Ordinal Values in ordered categories Ordering of the categories is not arbitrary. It is now possible to order the categories in a meaningful way.
38
Levels of measurement There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn't have. Nominal is the lowest level. Only names are meaningful here. Ordinal adds an order to the names. Interval adds meaningful differences. Ratio adds a zero so that ratios are meaningful.
39
Levels of measurement Nominal scale – eg., genotype
You can code it with numbers, but the order is arbitrary and any calculations would be meaningless. Ordinal scale – eg., pain score from 1 to 10 The order matters but not the difference between values. Interval scale – eg., temperature in C The difference between two values is meaningful. Ratio scale – eg., height It has a clear definition of 0. When the variable equals 0, there is none of that variable. When working with ratio variables, but not interval variables, you can look at the ratio of two measurements.
40
Data processing Some visual ways to summarize data: Tables Graphs
Bar charts Histograms Box plots
41
Frequency table Elements Formal Title Main column Main row Legend
Logical
42
Number of Anti-HBs (+) cases
Frequency table Simple table Table 1. Anti-HBs (+) outcomes per group from a HBV screening study* Title Screened group Number of Anti-HBs (+) cases % Chilldren of 7 y. 3 10% Chilldren of 11 y. 7 23% Chilldren of 17 y. Roma people 1 3% Contacts in family Health professionals 13 43% Total 30 100% Main row Main column Legend * Part of TPTBHB Project
43
Frequency table Complex table (cross tabulation)
Table 2. HBV high-risk groups to be screened by residence* Smolyan Zlatograd Rudozem Subtotal Contacts in family 65 20 15 100 Health professionals 98 30 22 150 Roma people Total: 350 Residence Risk group * Part of TPTBHB Project
44
Graphical summaries Variable Graph Statistics One qualitative
Bar chart Pie chart Frequency table Relative frequency table Proportion Two qualitative Side-by-side bar chart Segmented bar chart Two-way table Difference in proportions One quantitative Dotplot Histogram Boxplot Measures of central tendency Measures of spread Other: five number summary, percentiles, distribution shape One quantitative by one qualitative Side-by-side boxplots Stacked dotplots Statistics broken down by group Difference in means Two quantitative Scatterplot Correlation
45
Bar chart Bar chart is a way to visually represent qualitative data.
Data is displayed either horizontally or vertically and allows viewers to compare items, such as amounts, characteristics, and frequency. Bars are arranged in order of frequency, so more important categories are emphasized. Bar charts can be either single, stacked, or grouped.
46
Pie chart Pie chart is helpful when graphing qualitative data, where the information describes a trait or attribute and is not numerical. Each slice of pie represents a different category, and each trait corresponds to a different slice of the pie—with some slices usually noticeably larger than others.
47
Histogram A histogram is used with quantitative data. Ranges of values, called classes, are listed at the bottom, and the classes with greater frequencies have taller bars. A histogram often looks similar to a bar chart, but they are different because of the level of measurement of the data: A bar chart is for categorical data, and the x-axis has no numeric scale A histogram is for quantitative data, and the x-axis is numeric.
48
Boxplot Boxplot is a method for graphically depicting groups of numerical data through their quartiles.
49
Scatterplot Scatterplot is a type of plot using Cartesian coordinates to display values for two variables for a set of data. Data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.