Introduction to Biostatistics

Slides:



Advertisements
Similar presentations
© 2004 Prentice-Hall, Inc.Chap 1-1 Basic Business Statistics (9 th Edition) Chapter 1 Introduction and Data Collection.
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Introduction to Educational Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population vs. Sample.
MATH1342 S08 – 7:00A-8:15A T/R BB218 SPRING 2014 Daryl Rupp.
Chapter 1: Introduction to Statistics
Chapter 1 Introduction and Data Collection
© The McGraw-Hill Companies, Inc., by Marc M. Triola & Mario F. Triola SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD.
Statistics: Basic Concepts. Overview Survey objective: – Collect data from a smaller part of a larger group to learn something about the larger group.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Elementary Statistics M A R I O F. T R I O L A Copyright © 1998, Triola, Elementary.
Introduction to Probability and Statistics Consultation time: Ms. Chong.
Chapter 1: The Nature of Statistics
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
1  Specific number numerical measurement determined by a set of data Example: Twenty-three percent of people polled believed that there are too many polls.
Introduction Biostatistics Analysis: Lecture 1 Definitions and Data Collection.
Ch.1 INTRODUCTION TO STATISTICS Prepared by: M.S Nurzaman, MIDEc. ( deden )‏ (021) /
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
An Overview of Statistics Section 1.1. Ch1 Larson/Farber 2 Statistics is the science of collecting, organizing, analyzing, and interpreting data in order.
Unit 1 – Intro to Statistics Terminology Sampling and Bias Experimental versus Observational Studies Experimental Design.
Ch1 Larson/Farber 1 1 Elementary Statistics Larson Farber Introduction to Statistics As you view these slides be sure to have paper, pencil, a calculator.
Ch1 Larson/Farber 1 1 Elementary Statistics Larson Farber Introduction to Statistics As you view these slides be sure to have paper, pencil, a calculator.
Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
An Overview of Statistics Section 1.1 After you see the slides for each section, do the Try It Yourself problems in your text for that section to see if.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Biostatistics Introduction Article for Review.
Biostatistics Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
2 NURS/HSCI 597 NURSING RESEARCH & DATA ANALYSIS GEORGE MASON UNIVERSITY.
Introduction to Biostatistics Lecture 1. Biostatistics Definition: – The application of statistics to biological sciences Is the science which deals with.
Business Information Analysis, Chapter 1 Business & Commerce Discipline, IVE 1-1 Chapter One What is Statistics? GOALS When you have completed this chapter,
Chapter one: The Nature of Probability and Statistics.
Statistics Terminology. What is statistics? The science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Statistics & Evidence-Based Practice
Elementary Statistics
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
Basic Business Statistics
Pharmaceutical Statistics
Measurements Statistics
Elementary Statistics MOREHEAD STATE UNIVERSITY
Basic Statistics.
statistics Specific number
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introduction to biostatistics
Introduction to Statistics
Variables and Measurement (2.1)
Descriptive and inferential statistics
STATISTICS An Introduction.
Descriptive and inferential statistics. Confidence interval
Introduction to Statistics
Hypothesis testing. Chi-square test
statistics Specific number
The Nature of Probability and Statistics
Chapter 1 The Where, Why, and How of Data Collection
Elementary Statistics MOREHEAD STATE UNIVERSITY
Descriptive Statistics
Introduction to Biostatistics
Chapter 1 The Where, Why, and How of Data Collection
Statistics · the study of information Data · information
6A Types of Data, 6E Measuring the Centre of Data
Gathering and Organizing Data
Sampling.
Chapter 1 Introduction to Statistics
INTRODUCTION TO STATISTICS
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
The Where, Why, and How of Data Collection
Chapter 1 The Where, Why, and How of Data Collection
Presentation transcript:

Introduction to Biostatistics Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Definition of biostatistics The science of collecting, organizing, analyzing, interpreting and presenting data for the purpose of more effective decisions in clinical context.

Importance of biostatistics Identify and develop treatments for disease and estimate their effects Identify risk factors for diseases Design, monitor, analyze, interpret, and report results of clinical studies Develop statistical methodologies to address questions arising from medical/public health data

When do you need biostatistics? BEFORE you start your study! After that, it will be too late…

Population vs Sample Population includes all objects of interest whereas sample is only a portion of the population. Parameters are associated with populations and statistics with samples Parameters are usually denoted using Greek letters (μ, σ) while statistics are usually denoted using Roman letters (X, s) There are several reasons why we don't work with populations. They are usually large, and it is often impossible to get data for every object we're studying Sampling does not usually occur without cost, and the more items surveyed, the larger the cost

Descriptive vs Inferential statistics We compute statistics, and use them to estimate parameters. The computation is the first part of the statistical analysis (Descriptive Statistics) and the estimation is the second part (Inferential Statistics). Descriptive Statistics The procedure used to organize and summarize masses of data Inferential Statistics The methods used to find out something about a population, based on a sample

Inferential statistics Population Parameters Sampling From population to sample Sample Statistics From sample to population Inferential statistics

Inferential statistics Individuals in the population vary from one another with respect to an outcome of interest.

Inferential statistics When a sample is drawn there is no certainty that it will be representative for the population. Sample A Sample B

Inferential statistics Biased sample Biased sample is one in which the method used to create the sample results in samples that are systematically different from the population. Random sample In random sampling, each item or element of the population has an equal chance of being chosen at each draw.

Inferential statistics Sample B Sample A Population

Sampling Random sampling Each element in the population has an equal chance of occuring. While this is the preferred way of sampling, it is often difficult to do. It requires that a complete list of every element in the population be obtained. Computer generated lists are often used with random sampling. Systematic sampling The list of elements is "counted off". That is, every k-th element is taken. This is similar to lining everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all people numbered 4 would be used.

Sampling Convenience sampling In convenience sampling, readily available data is used. That is, the first people the surveyor runs into. Cluster sampling It is accomplished by dividing the population into groups (clusters), usually geographically. The clusters are randomly selected, and each element in the selected clusters are used.

Sampling Stratified sampling It divides the population into groups called strata. However, this time it is by some characteristic, not geographically. For instance, the population might be separated into males and females. A sample is taken from each of these strata using either random, systematic, or convenience sampling.

Inferential Statistics Sample B Sample A Population

Error Random error can be conceptualized as sampling variability. Bias (systematic error) is a difference between an observed value and the true value due to all causes other than sampling variability. Accuracy is a general term denoting the absence of error of all kinds.

Representative Sample Properties of a good sample Random selection Representativeness by structure Representativeness by number of cases

Sample size calculation Law of Large Numbers As the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero. Application in biostatistics Bigger sample size, smaller margin of error. A properly designed study will include a justification for the number of experimental units (people/animals) being examined. Sample size calculations are necessary to design experiments that are large enough to produce useful information and small enough to be practical.

Sample size calculation Generally, the sample size for any study depends on the: Acceptable level of confidence Power of the study Expected effect size Underlying event rate in the population Standard deviation in the population

Sample size calculation For quantitative variables: Z – confidence level SD – expected standard deviation d – absolute error of precision

Sample size calculation For quantitative variables: A researcher is interested in the average level of systolic blood pressure in children at 95% level of confidence and precision of 5 mmHg. Standard deviation, based on previous studies, is 25 mmHg.

Sample size calculation For qualitative variables: Z – confidence level p – expected proportion d – absolute error of precision

Sample size calculation For qualitative variables: A researcher is interested in the proportion of diabetes patients having hypertension. According to a previous study, the actual number is no more than 15%. The researcher wants to calculate this sample size with a 5% absolute precision error and a 95% confidence level.

Collection of Evidence (Data) Stages of biomedical research: Planning and organization Conduction of the investigation Data processing and analyses of results

Planning and organization Research programme: Aim Object Units of observation Indices of observation Place Time Statistical analyses Methodology

Planning and organization Aim The aim of the investigation is trying to summarize and formulate clearly the research hypothesis. Object Object of the investigation is the event, that is going to be studied. Units of observation Logical unit – each studied case Technical unit – the environment, where the logical units are situated Indices of observation – not too many, but important; measurable; additive and self controlling. Factorial Resultative

Planning and organization Place Time Single – events are studied in a single moment of time, the so called “critical moment”. Continuous – used to characterize a long term tendency of the events Statistical analyses Methodology

One vs Many Many measurements on one subject are not the same thing as one measurement on many subjects. With many measurements on one subject, you get to know the one subject quite well but you learn nothing about how the response varies across subjects. With one measurement on many subjects, you learn less about each individual, but you get a good sense of how the response varies across subjects.

Paired vs Unpaired Data are paired when two or more measurements are made on the same observational unit (subjects, couples, and so on). Data are unpaired, where only one type of measurement is made on each unit.

Planning and organization Research plan: Definition of the team, responsible for the study and preliminary training. Administration and management of the study.

Information processing Data check and correction Data coding Data aggregation According to the data usage: Primary Secondary According to the number of indices Simple Complex

Information processing It is always a good idea to summarize your data: You become familiar with the data and the characteristics of the sample that you are studying You can also identify problems with data collection or errors in the data (data management issues) Range checks for illogical values

Variables vs Data Mr. Smith Mrs. Johns Mrs. Oliver Age 36 43 56 Sex A variable is something whose value can vary. Data are the values you get when you measure a variable. Mr. Smith Mrs. Johns Mrs. Oliver Age 36 43 56 Sex Male Female Blood type A

Metric variables Continuous Discrete Measured units Metric continuous variables can be properly measured and have units of measurement. Continuous values on proper numeric line or scale Data are real numbers (located on the number line). Discrete Integer values on proper numeric line or scale Metric discrete variables can be properly counted and have units of measurement – ‘numbers of things’. Counted units

Categorical variables Nominal Values in arbitrary categories Ordering of the categories is completely arbitrary. In other words, categories cannot be ordered in any meaningful way. No units! Data do not have any units of measurement. Ordinal Values in ordered categories Ordering of the categories is not arbitrary. It is now possible to order the categories in a meaningful way.

Levels of Measurement

Levels of Measurement There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn't have. Nominal is the lowest level. Only names are meaningful here. Ordinal adds an order to the names. Interval adds meaningful differences. Ratio adds a zero so that ratios are meaningful.

Levels of Measurement Nominal scale – eg., genotype You can code it with numbers, but the order is arbitrary and any calculations would be meaningless. Ordinal scale – eg., pain score from 1 to 10 The order matters but not the difference between values. Interval scale – eg., temperature in C The difference between two values is meaningful. Ratio scale – eg., height It has a clear definition of 0. When the variable equals 0, there is none of that variable. When working with ratio variables, but not interval variables, you can look at the ratio of two measurements.

Information processing Some visual ways to summarize data: Tables Graphs Bar charts Histograms Box plots

Frequency table Elements Formal Title Main column Main row Legend Logical

HbsAg /+/ contacts in family Frequency table Table 1. Anti-HBs (+) outcomes per group from a HBV screening study* Title Screened group Anti-HBs (+) % Chilldren of 7 y. 3 10% Chilldren of 11 y. 7 23% Chilldren of 17 y. Roma people 1 3% HbsAg /+/ contacts in family Health professionals 13 43% Total 30 100% Main row Main column Legend * Part of TPTBHB Project

HbsAg /+/ contacts in family Frequency table Simple table Table 1. Anti-HBs (+) outcomes per group from a HBV screening study* Screened group Anti-HBs (+) % Chilldren of 7 y. 3 10% Chilldren of 11 y. 7 23% Chilldren of 17 y. Roma people 1 3% HbsAg /+/ contacts in family Health professionals 13 43% Total 30 100% * Part of TPTBHB Project

Frequency table Complex table (cross tabulation) Table 2. HBV high-risk groups to be screened by residence* Smolyan Zlatograd Rudozem Subtotal HbsAg /+/ contacts in family 65 20 15 100 Health professionals 98 30 22 150 Roma people Total: 350 Residence Risk group * Part of TPTBHB Project

Graphical summaries Bar charts Categorical data Histograms Continuous data Box plots