Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Biostatistics

Similar presentations


Presentation on theme: "Introduction to Biostatistics"— Presentation transcript:

1 Introduction to Biostatistics
Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College of Human Medicine Michigan State University

2 What does “STATISTICS” mean?
The word “Statistics” has several meanings: It is frequently used in referring to recorded data  Statistics also denotes characteristics calculated for a set of data, for example, sample mean Statistics also refers to statistical methodology, techniques and procedures dealing with the design of experiments, collection, organization, analysis of the information contained in a data set to make inferences about the population parameters

3 What do statisticians do?
To guide the design of an experiment or survey prior to the data collection To analyze data using proper statistical procedures and techniques To present and interpret results to the researchers and other decision makers including the government and industries

4 WHY STUDY STATISTICS? Knowledge of statistics is essential for people going into research, management or graduate study   Basic understanding of statistics is useful for conducting investigations and an effective presentation Understanding of statistics can help anyone discriminate between fact and fancy in daily life   A course in statistics should help one know when, and for what, a statistician should be consulted

5 Definition of Population & Sample
A population is a set of measurements of interest to the researcher. Examples: 1. Income of households living in Karachi  2. The number of children in families living Pakistan  3. The health status of adults in a community A subset of the population is called sample A sample is usually selected such that it is representative of the population

6 Descriptive & Inferential Statistics
1. Descriptive Statistics deal with the enumeration, organization and graphical representation of data 2. Inferential Statistics are concerned with reaching conclusions from incomplete information, that is, generalizing from the specific sample An example of inferential statistics include using available information about the health status of people in a sample to draw inferences about the underlying population from which the sample is selected

7 INFERENTIAL STATISTICS
The objective of inferential statistics is to make inference about the population parameters based on the information contained in the sample. Estimation (e.g., Estimating the prevalence of hypertension among adults living in Karachi) Testing Hypothesis (e.g., Testing the effectiveness of a new drug for reducing cholesterol levels)

8 Sources of Data Data may come from different sources:
Surveillance systems (e.g., NIH) Planned surveys (Government, Universities, NGOs) Experiments (Pharmaceutical Companies) Health Organizations (Administrative Data sets) Private sector (Banks, Companies, etc) Government (All government agencies) Here we will focus on surveys and experiments What is the difference between a survey and an experiment?

9 Difference between Surveys & Experiments
A Survey Data represent observations of events or phenomena over which few, if any, controls are imposed. (e.g., Assessing the association between different lifestyles and heart disease) In an experiment we design a research plan purposely to impose controls over the amount of exposure (treatment) to a drug. (e.g., Clinical Trials)

10 Sampling Methods Random Sampling (Simple) Systematic Sampling
Stratified Sampling Cluster Sampling Convenience Sampling More complex sampling

11 Some Epidemiologic Studies
Retrospective Studies: Retrospective Studies gather past data from selected cases and controls to determine difference, if any, in the exposure to a suspected factor. They are commonly referred to as case-control studies Prospective Studies: Prospective studies are usually cohort studies in which one enrolls a group of healthy people and follows them over a certain period to determine the frequency with which a disease develops

12 Qualitative and Quantitative Variables
Examples of qualitative variables are occupation, sex, marital status, and etc Variables that yield observations that can be measured are considered to be quantitative variables. Examples of quantitative variables are weight, height, and age Quantitative variables can further be classified as discrete or continuous

13 VARIABLES TYPES Categorical variables (e.g., Sex, Marital Status, income category) Continuous variables (e.g., Age, income, weight, height, time to achieve an outcome) Discrete variables (e.g.,Number of Children in a family) Binary or Dichotomous variables (e.g., response to all Yes or No type of questions)

14 VARIABLES SCALE SCALE OF VARIABLE Nominal Scale Ordinal Scale
Interval Scale Interval Ratio Scale

15 Scale of Data 1. Nominal: These data do not represent an amount or quantity (e.g., Marital Status, Sex) 2. Ordinal: These data represent an ordered series of relationship (e.g., level of education) 3. Interval: These data is measured on an interval scale having equal units but an arbitrary zero point. (e.g.: Temperature in Fahrenheit) 4. Interval Ratio: Variable such as weight for which we can compare meaningfully one weight versus another (say, 100 Kg is twice 50 Kg)

16 VARIABLES IN THE PROTOCOL
TYPES OF VARIABLE independent dependent intermediate confounding

17 Independent Variable The characteristic being observed and/or measured that is hypothesized to influence an event or outcome (dependent variable). NOTE The independent variable is not influenced by the event or outcome, but may cause it or contribute to its variation.

18 Dependent Variable A variable whose value is dependent on the effect of other variables (ie., “independent variables”) in the relationship being studied. Synonyms: outcome or response variable. NOTE an event or outcome whose variation we seek to explain or account for by the influence of independent variables.

19 Intermediate Variable
A variable that occurs in a causal pathway from an independent to a dependent variable. Synonyms: intervening, mediating NOTES it produces variation in the dependent variable, and is caused to vary by the independent variable. such a variable is “associated” with both the dependent and independent variables.

20 Confounding Variable A factor (that is itself a determinant of the outcome), that distorts the apparent effect of a study variable on the outcome. NOTE such a factor may be unequally distributed among the exposed and the unexposed, and thereby influence the apparent magnitude and even the direction of the effect.

21 Organizing Data Frequency Table Frequency Histogram
Relative Frequency Histogram Frequency polygon Relative Frequency polygon Bar chart Pie chart stem-and-leaf display Box Plot

22 Frequency Table Suppose we are interested in studying the number of children in the families living in a community. The following data has been collected based on a random sample of n = 30 families from the community. 2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6, 5, 4 , 2, 4, 4, 7, 6 Organize this data in a Frequency Table!

23 X=No. of Children Count (Freq.) Relative Freq. 2 2/30=0.067 1 3 3/30=0.100 5 5/30=0.167 4 6 6/30=0.200 4/30=0.133 7 8 1/30=0.033

24

25 Frequency Table Now suppose we need to construct a similar frequency table for the age of patients with Heart related problems in a clinic. The following data has been collected based on a random sample of n = 30 patients who went to the emergency room of the clinic for Heart related problems. The measurements are: 42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67, 53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54, 56, 52, 40, 55, 72, 69.

26 Age Groups Frequency Relative Frequency 2 2/30=0.067 3 3/30=0.100 4 4/30=0.134 8 8/30=0.267 67-72 Total n=30 1.00

27 Measures of Central Tendency
Where is the heart of distribution? 1. Mean 2. Median 3. Mode

28 Sample Mean The arithmetic mean (or, simply, mean) is computed by summing all the observations in the sample and dividing the sum by the number of observations. For a sample of five household incomes, 6000, 10,000, 10,000, 14000, 50,000 the sample mean is,

29 Sample Median In a list ranked from smallest measurement to the highest, the median is the middle value In our example of five household incomes, first we rank the measurements 6,000, 10,000, 10,000, 14,000, 50,000 Sample Median is 10,000

30 Measures of Dispersion or Variability
Range Variance Standard deviation

31 Formula for Sample Variance & Standard deviation S

32 Calculation of Variance and Standard deviation

33 Empirical Rule For a Normal distribution approximately,
a) 68% of the measurements fall within one standard deviation around the mean b) 95% of the measurements fall within two standard deviations around the mean c) 99.7% of the measurements fall within three standard deviations around the mean

34 Suppose the reaction time of a particular drug has a Normal distribution with a mean of 10 minutes and a standard deviation of 2 minutes Approximately, a) 68% of the subjects taking the drug will have reaction tome between 8 and 12 minutes b) 95% of the subjects taking the drug will have reaction tome between 6 and 14 minutes c) 99.7% of the subjects taking the drug will have reaction tome between 4 and 16 minutes


Download ppt "Introduction to Biostatistics"

Similar presentations


Ads by Google