Introduction to Biostatistics

Introduction to Biostatistics
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Definition of biostatistics
The science of collecting, organizing, analyzing, interpreting and presenting data for the purpose of more effective decisions in clinical context.

Importance of biostatistics
Identify and develop treatments for disease and estimate their effects Identify risk factors for diseases Design, monitor, analyze, interpret, and report results of clinical studies Develop statistical methodologies to address questions arising from medical/public health data

When do you need biostatistics?
BEFORE you start your study! After that, it will be too late…

Population vs Sample Population includes all objects of interest whereas sample is only a portion of the population. Parameters are associated with populations and statistics with samples Parameters are usually denoted using Greek letters (μ, σ) while statistics are usually denoted using Roman letters (X, s) There are several reasons why we don't work with populations. They are usually large, and it is often impossible to get data for every object we're studying Sampling does not usually occur without cost, and the more items surveyed, the larger the cost

Descriptive vs Inferential statistics
We compute statistics, and use them to estimate parameters. The computation is the first part of the statistical analysis (Descriptive Statistics) and the estimation is the second part (Inferential Statistics). Descriptive Statistics The procedure used to organize and summarize masses of data Inferential Statistics The methods used to find out something about a population, based on a sample

Inferential statistics
Population Parameters Sampling From population to sample Sample Statistics From sample to population Inferential statistics

Individuals in the population vary from one another with respect to an outcome of interest.

When a sample is drawn there is no certainty that it will be representative for the population. Sample A Sample B

Biased sample Biased sample is one in which the method used to create the sample results in samples that are systematically different from the population. Random sample In random sampling, each item or element of the population has an equal chance of being chosen at each draw.

Sample B Sample A Population

Sampling Random sampling
Each element in the population has an equal chance of occuring. While this is the preferred way of sampling, it is often difficult to do. It requires that a complete list of every element in the population be obtained. Computer generated lists are often used with random sampling. Systematic sampling The list of elements is "counted off". That is, every k-th element is taken. This is similar to lining everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all people numbered 4 would be used.

Sampling Convenience sampling
In convenience sampling, readily available data is used. That is, the first people the surveyor runs into. Cluster sampling It is accomplished by dividing the population into groups (clusters), usually geographically. The clusters are randomly selected, and each element in the selected clusters are used.

Sampling Stratified sampling
It divides the population into groups called strata. However, this time it is by some characteristic, not geographically. For instance, the population might be separated into males and females. A sample is taken from each of these strata using either random, systematic, or convenience sampling.

Inferential Statistics
Sample B Sample A Population

Error Random error can be conceptualized as sampling variability.
Bias (systematic error) is a difference between an observed value and the true value due to all causes other than sampling variability. Accuracy is a general term denoting the absence of error of all kinds.

Representative Sample
Properties of a good sample Random selection Representativeness by structure Representativeness by number of cases

Sample size calculation
Law of Large Numbers As the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero. Application in biostatistics Bigger sample size, smaller margin of error. A properly designed study will include a justification for the number of experimental units (people/animals) being examined. Sample size calculations are necessary to design experiments that are large enough to produce useful information and small enough to be practical.

Generally, the sample size for any study depends on the: Acceptable level of confidence Power of the study Expected effect size Underlying event rate in the population Standard deviation in the population

For quantitative variables: Z – confidence level SD – expected standard deviation d – absolute error of precision

For quantitative variables: A researcher is interested in the average level of systolic blood pressure in children at 95% level of confidence and precision of 5 mmHg. Standard deviation, based on previous studies, is 25 mmHg.

For qualitative variables: Z – confidence level p – expected proportion d – absolute error of precision

For qualitative variables: A researcher is interested in the proportion of diabetes patients having hypertension. According to a previous study, the actual number is no more than 15%. The researcher wants to calculate this sample size with a 5% absolute precision error and a 95% confidence level.

Collection of Evidence (Data)
Stages of biomedical research: Planning and organization Conduction of the investigation Data processing and analyses of results

Planning and organization
Research programme: Aim Object Units of observation Indices of observation Place Time Statistical analyses Methodology

Aim The aim of the investigation is trying to summarize and formulate clearly the research hypothesis. Object Object of the investigation is the event, that is going to be studied. Units of observation Logical unit – each studied case Technical unit – the environment, where the logical units are situated Indices of observation – not too many, but important; measurable; additive and self controlling. Factorial Resultative

Place Time Single – events are studied in a single moment of time, the so called “critical moment”. Continuous – used to characterize a long term tendency of the events Statistical analyses Methodology

One vs Many Many measurements on one subject are not the same thing as one measurement on many subjects. With many measurements on one subject, you get to know the one subject quite well but you learn nothing about how the response varies across subjects. With one measurement on many subjects, you learn less about each individual, but you get a good sense of how the response varies across subjects.

Paired vs Unpaired Data are paired when two or more measurements are made on the same observational unit (subjects, couples, and so on). Data are unpaired, where only one type of measurement is made on each unit.

Research plan: Definition of the team, responsible for the study and preliminary training. Administration and management of the study.

Information processing
Data check and correction Data coding Data aggregation According to the data usage: Primary Secondary According to the number of indices Simple Complex

It is always a good idea to summarize your data: You become familiar with the data and the characteristics of the sample that you are studying You can also identify problems with data collection or errors in the data (data management issues) Range checks for illogical values

Variables vs Data Mr. Smith Mrs. Johns Mrs. Oliver Age 36 43 56 Sex
A variable is something whose value can vary. Data are the values you get when you measure a variable. Mr. Smith Mrs. Johns Mrs. Oliver Age 36 43 56 Sex Male Female Blood type A

Metric variables Continuous Discrete Measured units
Metric continuous variables can be properly measured and have units of measurement. Continuous values on proper numeric line or scale Data are real numbers (located on the number line). Discrete Integer values on proper numeric line or scale Metric discrete variables can be properly counted and have units of measurement – ‘numbers of things’. Counted units

Categorical variables
Nominal Values in arbitrary categories Ordering of the categories is completely arbitrary. In other words, categories cannot be ordered in any meaningful way. No units! Data do not have any units of measurement. Ordinal Values in ordered categories Ordering of the categories is not arbitrary. It is now possible to order the categories in a meaningful way.

Levels of Measurement

Levels of Measurement There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn't have. Nominal is the lowest level. Only names are meaningful here. Ordinal adds an order to the names. Interval adds meaningful differences. Ratio adds a zero so that ratios are meaningful.

Levels of Measurement Nominal scale – eg., genotype
You can code it with numbers, but the order is arbitrary and any calculations would be meaningless. Ordinal scale – eg., pain score from 1 to 10 The order matters but not the difference between values. Interval scale – eg., temperature in C The difference between two values is meaningful. Ratio scale – eg., height It has a clear definition of 0. When the variable equals 0, there is none of that variable. When working with ratio variables, but not interval variables, you can look at the ratio of two measurements.

Some visual ways to summarize data: Tables Graphs Bar charts Histograms Box plots

Frequency table Elements Formal Title Main column Main row Legend
Logical

HbsAg /+/ contacts in family
Frequency table Table 1. Anti-HBs (+) outcomes per group from a HBV screening study* Title Screened group Anti-HBs (+) % Chilldren of 7 y. 3 10% Chilldren of 11 y. 7 23% Chilldren of 17 y. Roma people 1 3% HbsAg /+/ contacts in family Health professionals 13 43% Total 30 100% Main row Main column Legend * Part of TPTBHB Project

HbsAg /+/ contacts in family
Frequency table Simple table Table 1. Anti-HBs (+) outcomes per group from a HBV screening study* Screened group Anti-HBs (+) % Chilldren of 7 y. 3 10% Chilldren of 11 y. 7 23% Chilldren of 17 y. Roma people 1 3% HbsAg /+/ contacts in family Health professionals 13 43% Total 30 100% * Part of TPTBHB Project

Frequency table Complex table (cross tabulation)
Table 2. HBV high-risk groups to be screened by residence* Smolyan Zlatograd Rudozem Subtotal HbsAg /+/ contacts in family 65 20 15 100 Health professionals 98 30 22 150 Roma people Total: 350 Residence Risk group * Part of TPTBHB Project

Graphical summaries Bar charts Categorical data Histograms
Continuous data Box plots

Introduction to Biostatistics

Similar presentations

Presentation on theme: "Introduction to Biostatistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Biostatistics

Similar presentations

Presentation on theme: "Introduction to Biostatistics"— Presentation transcript:

Similar presentations

About project

Feedback