Simple statistics for clinicians on respiratory research By Giovanni Sotgiu Hygiene and Preventive Medicine Institute University of Sassari Medical School.

Slides:



Advertisements
Similar presentations
Measurement, Evaluation, Assessment and Statistics
Advertisements

Describing Quantitative Variables
Class Session #2 Numerically Summarizing Data
Population Population
Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Introduction to Summary Statistics
Exploratory Data Analysis (Descriptive Statistics)
Introduction to statistics in medicine – Part 1 Arier Lee.
Chapter 1 & 3.
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Introduction to Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Data Presentation.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
Chapter 2 Describing Data.
Introduction to Medical Statistics Sun Jing Health Statistics Department.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 1
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Biostatistics.
Categorical vs. Quantitative…
Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Biostatistics, statistical software I. Basic statistical concepts Krisztina Boda PhD Department of Medical Informatics, University of Szeged.
Medical Statistics as a science
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Descriptive Statistics Tabular and Graphical Displays –Frequency Distribution - List of intervals of values for a variable, and the number of occurrences.
LIS 570 Summarising and presenting data - Univariate analysis.
Chapter 2 Describing and Presenting a Distribution of Scores.
Elementary Statistics (Math 145) June 19, Statistics is the science of collecting, analyzing, interpreting, and presenting data. is the science.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Biostatistics Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Data Mining: Concepts and Techniques
Chapter 2: Methods for Describing Data Sets
Description of Data (Summary and Variability measures)
Descriptive Statistics
Module 8 Statistical Reasoning in Everyday Life
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Biostatistics College of Medicine University of Malawi 2011.
Summary (Week 1) Categorical vs. Quantitative Variables
Ticket in the Door GA Milestone Practice Test
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Simple statistics for clinicians on respiratory research By Giovanni Sotgiu Hygiene and Preventive Medicine Institute University of Sassari Medical School Italy

What are your expectations?

Too difficult to explain medical statistics in 30 min…..

What is medical statistics?

“..Discipline concerned with the treatment of numerical data derived from groups of individuals..” P Armitage “..Art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results..” JM Last What is medical statistics?

Collection of statistical procedures  well-suited to the analysis of healthcare-related data

Why we need to study statistics in the field of medicine……..

1)Basic requirement of medical research 2)Update your medical knowledge 3)Data management and treatment Why we need to study statistics…

1) Basic concepts 2) Sample and population 3)Probability 4) Data description 5) Measures of disease Road map

Basic concepts

All individuals have similar values or belong to the same category Ex.: all individuals are Chinese, ….women, ….middle age (30~40 years old), ….work in the same factory  homogeneity in nationality, gender, age and occupation 1. Homogeneity

Basic concepts  Differences in height, weight, treatment… 1. Variation

Toss a coin  The mark face may be up or down Treat the patients suffering from TB with the same antibiotics: a part of them recovered and others didn’t 1. Variation

 no variation, no statistics 1. Variation

What is the target of our studies?

Population

 the whole collection of individuals that one intends to study 2. Population

 economic issues  short time 2. Population

2. Population and sample

 a representative part of the population 2. Sample

Sampling By chance!

Random Random event  the event may occur or may not occur in one experiment  before one experiment, nobody is sure whether the event occurs or not

Random Please, give some examples of random event…

The mathematical procedures whereby we convert information about the sample into intelligent guesses about the population fall under the section of inferential Statistics (generalization)

Probability

3. Probability Measure the possibility of occurrence of a random event P(A) = The Number Of Ways Event A Can Occur The total number Of Possible Outcomes

Number of observations: n (large enough) Number of occurrences of random event A: m P(A)  m/n  relative frequency theory Estimation of Probability  Frequency

3. Probability A  random event P(A)  Probability of the random event A P(A)  1, if an event always occurs P(A)  0, if an event never occurs

Please, give some examples for probability of a random event and frequency of that random event

Parameters and statistics

4. Parameter A measurement describing some characteristic of a population or A measurement of the distribution of a characteristic of a population  Greek letter (μ,π, etc.) Usually unknown

 to know the parameter of a population we need a sample

A measurement describing some characteristic of a sample or A measurement of the distribution of a characteristic of a sample  Latin letter (s, p, etc.) 4. Statistic

Please give an example for parameter and statistics Does a parameter vary? Does a statistic vary? 4. Statistic

Sampling Error

5. Sampling Error Difference between observed value and true value

5. Sampling Error 1) Systematic error (fixed) 2) Measurement error (random) 3) Sampling error (random)

Sampling error The statistics  different from the parameter! The statistics of different samples from same population  different each other!

Sampling error The sampling error exists in any sampling research It can not be avoided but may be estimated

Nature of data

Variables and data Variables are labels whose value can literally vary Data is the value you get from observing  measuring, counting, assessing etc.

Data Categorical Data Metric Data Nominal Data Ordinal Data Discrete Data Continuous Data

Nominal or categorical data It can be allocated into one of a number of categories Blood type, sex, Linezolid treatment (y/n) Data cannot be arranged in an ordering scheme

Ordinal categorical data It can be allocated to one of a number of categories but it has to be put in meaningful order Differences cannot be determined or are meaningless Very satisfied, satisfied, neutral, unsatisfied, very unsatisfied (new treatment)

Discrete metric data Countable variables  number of possible values is a finite number Numbers of days of hospitalization Numbers of men treated with isoniazid

Continuous metric data Measurable variables Infinitely many possible values  continuous scale covering a range of values without gaps Kg, m, mmHg, years

Describing data….. with tables

Describing data with tables 1) actual frequency 2) relative and cumulative frequency 3) grouped frequency 4) open- ended groups 5) cross-tabulation

1) Frequency table Frequency distribution TB mortality (%)TallyNo. of wards , 1, 1, 1, 1, 1, 1, 1, , 1, 1, 1, 1, 1, 1, , 1, 1, 1, , 1, ,1 variables frequency

2) Relative frequency, cumulative frequency Relative frequency  proportion of the total No. of resistancesNo. of patients Relative frequency (%) Cumulative frequency (%)

3) Grouped frequency Grouped frequency  works for continuous metric data Birth weightNo. of infants born from mothers with TB A group width of 300g The class lower limit The class upper limit

General rules Frequency table  nominal, ordinal and discrete metric data Grouped frequency table  continuous metric data

4) Open-ended group One or more values which are called outliers, long away from the general mass of the data Use ≤ or ≥

5) Cross-tabulation Two variables within a single group of individuals Pulmonary mass TB/HIV+ Totals YesNo Benign Malignant448 Totals

Describing data….. with charts

3. Describing data with charts 1)Charting nominal data a)pie chart b)simple bar chart c)cluster bar chart d) stacked bar chart 2) Charting ordinal data a)pie chart b)bar chart c)dotplot 3) Charting discrete metric data 4) Charting continuous metric data histogram 5) Charting cumulative ordinal or discrete metric data step chart 6) Charting cumulative metric continuous data cumulative frequency or ogive 7) Charting time based time –series chart

1-a) Pie chart 4-5 categories One variable Start at 0° in the same order as the table Adverse events of ethionamide

1-b) Simple bar chart Same widths, equal spaces b/w bars n

1-c) Clustered bar chart

1-d) Stacked bar chart

2-3) Dot-plot Useful with ordinal variables if the number of categories is too large for a bar chart

4) Histogram Percentage of age distribution of pregnant TB women < >35 TB cases %

6) Cumulative frequency curve

Describing data from its distributional shape

Symmetric mound-shaped distributions > > 35 Percentage of age distribution of pregnant women with TB

Skewed distributions > 85 Age distribution for migrants who develop TB

Bimodal distributions A bimodal distribution is one with two distinct humps

Normal-ness Symmetric Same mean, median, mode

Describing data with numeric summary value

1. numbers, proportions (percentages) 2. summary measures of location 3. summary measures of spread

Numbers and proportions Numbers  actual frequencies Percentage is a proportion multiplied by 100 1)Prevalence 2) Incidence

Prevalence -nature  relative frequency  number of existing cases in some population at a given time t0t0 disease health

Prevalence No. of existing cases of a disease at t 0 = 0…..1 total population A (N=6)B (N=4) f a =1 No comparison f r =0.17f r =0.25 Comparison Disease Health

Prevalence P == 0 P == 0.25 P == 1 Disease Health

Prevalence Prevalence data: - Highlight the time of the evaluation Example: P (2010)= 0.17 P (2010)= 17 per 100 individuals

Incidence  estimates the risk of developing disease t0t0 t1t1 People at risk (healthy) Disease Health

No. of new cases during given t 0 - t 1 total population at risk Incidence - Measures the probability or risk of developing disease during given time period - Absolute risk  probabilityof developing an adverse event

Incidence -Assess the health status at baseline  esclude prevalent cases at t 0 -Define a follow-up for the cohort  Healthy people followed-up for a given time period

Cohort Closed Population  adds no new members over time, and loses members only to disease/death Open Population  may gain members over time, through immigration or birth, or lose members through emigration

Cumulative incidence - Closed population - Individual time period at risk  same period for all the members A > B > C > D > E > t0t0 t1t1 time PeoplePeople 03

No. of new cases during given t 0 - t 1 total population at risk Cumulative incidence

Example: t 0 = 24; new cases= 3; follow-up = 3 years CI in 3 years = new cases per 1 individual at risk enrolled at t new cases in 100 individuals at risk enrolled at t 0 t0t0 t1t1 time PeoplePeople 03 Cumulative incidence

- Closed popularion  rare - Short follow-up and enrollment of a few individuals - Open population Cumulative incidence…critical features

Open population -Non cases (drop-out) and cases during the follow-up - Enrollment of new individuals during the follow-up - Length of follow-up not uniform

A > B > D > F > H > t0t0 t1t1 time PeoplePeople G > I > Drop-out Case C > E > Open population

Coorte dinamica Individual time period at risk not uniform  Estimate the population at risk: - Total person-time - Estimate of the total person-time

Coorte dinamica Total person-time   individual time period at risk Person-time: days-, months-, years

Density of incidence No. of new cases during given t 0 - t 1 total person-time

1 (A)51 person x 5 years5 person-years 3 (B, C, D)23 person x 2 years6 person-years 2 (E, F)2.52 person x 2.5 years5 person-years 2 (G, H)1.52 person x 1.5 years3 person-years 1 (I)31 person x 3 years3 person-years N Individual time period at risk Person-years Total person-time22 person-years Person-years Density of incidence

1 new case 22 person-years 0,045 new cases = 1 person-years = 0,045  45 per 1000 person-years Density of incidence

Open population Estimate of the total person-time  Individual time period at risk not known for all -Migration Movement of the cohort in the middle of the follow-up

Estimate of the total person-time  (P 0 + P t )/2  x follow-up

At t 0 : 100 people Follow-up: 3 years New cases: 3 Drop-out: 17 Enrollment during the follow-up: 16 >>>P 0 = 100; P t = ( ) = 96  (P 0 + P t )/2  x follow-up  ( )/2  x 3 = 294 person-years Estimate of the total person-time

Test the estimate: 80 people x 3 years = 240 person-years Movement of the cohort (17 x 1.5) + (3 x 1.5) + (16 x 1.5) = 54 person-years = 294 person-years At t 0 : 100 people Follow-up: 3 years New cases: 3 Drop-out: 17 Enrollment during the follow-up: 16 Estimate of the total person-time

Incidence rate 3 new cases/ 294 person-years x 1000 = 10.2 No. of new cases during given t 0 - t 1 estimate of total person-time

Summary measures of location 1)mode: category or value occurs the most often, typical- ness.  Categorical, metric discrete 2) median: middle value in ascending order, central-ness.  ordinal and metric data 3) mean (average): divide the sum of the values by the number of values 4) percentile: divide the total number of the values into 100 equal-sized groups.

Choosing the most appropriate measure ModeMedianMean Nominalyesno Ordinalyes no Metric discrete yesYes, when markedly skewed yes Metric continuous yesYes, when markedly skewed yes

Summary measure of spread Range  distance from the smallest value to the largest IQR (interquartile range)  spread of the middle half of the values Boxplot  graphical summary of the three quartile values, the minimum and maximum values, and outliers.

Standard deviation Average distance of all the data values from the mean value The smaller the average distance is, the narrower the spread, and vice versa Used metric data only

1.Subtract the mean from each of the n value in the sample, to give the different values 2. Square each of these differences 3. Add these squared values together (sum of squares) 4. Divide the sum of squares by 1 less than the sample size. (n-1) 5. Take the square-root

Standard deviation and the normal distribution

The Basic Steps of Statistical Work 1. Design of study Professional design: Research aim Subjects, Measures, etc.

Statistical design: Sampling or allocation method, Sample size, Randomization, Data processing, etc.

2. Collection of data Source of data Government report system Registration system Routine records Ad hoc survey

Data collection  accuracy, complete, in time Protocol: Place, subjects, timing; training; pilot; questionnaire; instruments; sampling method and sample size; budget Procedure: observation, interview filling form, letter telephone, web

3. Data Sorting Checking Hand, computer software Amend Missing data? Grouping According to categorical variables (sex, occupation, disease…) According to numerical variables (age, income, blood pressure …)

4. Data Analysis Descriptive statistics (show the sample) mean, incidence rate … -- Table and plot Inferential statistics (towards the population) -- Estimation Hypothesis test (comparison)

Definition of Selection Bias Selection bias: Selection biases are distortions that result from procedures used to select subjects and from factors that influence study participation. The common element of such biases is that the association between exposure and disease is different for those who participate and those who should be theoretically eligible for study, including those who do not participate.

Definition of Selection Bias It is sometimes (but not always) possible to disentangle the effects of participation from those of disease determinants using standard methods for the control of confounding. One example is the bias introduced by matching in case-control studies.

Definition of Confounding Confounding:  bias in estimating an epidemiologic measure of effect resulting from an imbalance of other causes of disease in the compared groups. (mixing of effects)

Characteristics of a Confounder associated with disease (in non-exposed) associated with exposure (in source population) not an intermediate cause