Download presentation
Presentation is loading. Please wait.
Published byVernon Rich Modified over 9 years ago
2
Simple statistics for clinicians on respiratory research By Giovanni Sotgiu Hygiene and Preventive Medicine Institute University of Sassari Medical School Italy
3
What are your expectations?
4
Too difficult to explain medical statistics in 30 min…..
5
What is medical statistics?
6
“..Discipline concerned with the treatment of numerical data derived from groups of individuals..” P Armitage “..Art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results..” JM Last What is medical statistics?
7
Collection of statistical procedures well-suited to the analysis of healthcare-related data
8
Why we need to study statistics in the field of medicine……..
9
1)Basic requirement of medical research 2)Update your medical knowledge 3)Data management and treatment Why we need to study statistics…
10
1) Basic concepts 2) Sample and population 3)Probability 4) Data description 5) Measures of disease Road map
11
Basic concepts
12
All individuals have similar values or belong to the same category Ex.: all individuals are Chinese, ….women, ….middle age (30~40 years old), ….work in the same factory homogeneity in nationality, gender, age and occupation 1. Homogeneity
13
Basic concepts Differences in height, weight, treatment… 1. Variation
14
Toss a coin The mark face may be up or down Treat the patients suffering from TB with the same antibiotics: a part of them recovered and others didn’t 1. Variation
15
no variation, no statistics 1. Variation
16
What is the target of our studies?
17
Population
18
the whole collection of individuals that one intends to study 2. Population
19
economic issues short time 2. Population
20
2. Population and sample
21
a representative part of the population 2. Sample
22
Sampling By chance!
23
Random Random event the event may occur or may not occur in one experiment before one experiment, nobody is sure whether the event occurs or not
24
Random Please, give some examples of random event…
25
The mathematical procedures whereby we convert information about the sample into intelligent guesses about the population fall under the section of inferential Statistics (generalization)
26
Probability
27
3. Probability Measure the possibility of occurrence of a random event P(A) = The Number Of Ways Event A Can Occur The total number Of Possible Outcomes
28
Number of observations: n (large enough) Number of occurrences of random event A: m P(A) m/n relative frequency theory Estimation of Probability Frequency
29
3. Probability A random event P(A) Probability of the random event A P(A) 1, if an event always occurs P(A) 0, if an event never occurs
30
Please, give some examples for probability of a random event and frequency of that random event
31
Parameters and statistics
32
4. Parameter A measurement describing some characteristic of a population or A measurement of the distribution of a characteristic of a population Greek letter (μ,π, etc.) Usually unknown
33
to know the parameter of a population we need a sample
34
A measurement describing some characteristic of a sample or A measurement of the distribution of a characteristic of a sample Latin letter (s, p, etc.) 4. Statistic
35
Please give an example for parameter and statistics Does a parameter vary? Does a statistic vary? 4. Statistic
36
Sampling Error
37
5. Sampling Error Difference between observed value and true value
38
5. Sampling Error 1) Systematic error (fixed) 2) Measurement error (random) 3) Sampling error (random)
39
Sampling error The statistics different from the parameter! The statistics of different samples from same population different each other!
40
Sampling error The sampling error exists in any sampling research It can not be avoided but may be estimated
41
Nature of data
42
Variables and data Variables are labels whose value can literally vary Data is the value you get from observing measuring, counting, assessing etc.
43
Data Categorical Data Metric Data Nominal Data Ordinal Data Discrete Data Continuous Data
44
Nominal or categorical data It can be allocated into one of a number of categories Blood type, sex, Linezolid treatment (y/n) Data cannot be arranged in an ordering scheme
45
Ordinal categorical data It can be allocated to one of a number of categories but it has to be put in meaningful order Differences cannot be determined or are meaningless Very satisfied, satisfied, neutral, unsatisfied, very unsatisfied (new treatment)
46
Discrete metric data Countable variables number of possible values is a finite number Numbers of days of hospitalization Numbers of men treated with isoniazid
47
Continuous metric data Measurable variables Infinitely many possible values continuous scale covering a range of values without gaps Kg, m, mmHg, years
48
Describing data….. with tables
49
Describing data with tables 1) actual frequency 2) relative and cumulative frequency 3) grouped frequency 4) open- ended groups 5) cross-tabulation
50
1) Frequency table Frequency distribution TB mortality (%)TallyNo. of wards 11.2-15.11, 1, 1, 1, 1, 1, 1, 1, 19 15.2-20.11, 1, 1, 1, 1, 1, 1, 18 20.2-25.11, 1, 1, 1, 15 25.2-30.11, 1, 13 30.2-35.11,1 variables frequency
51
2) Relative frequency, cumulative frequency Relative frequency proportion of the total No. of resistancesNo. of patients Relative frequency (%) Cumulative frequency (%) 0512.5 161527.5 2143562.5 3102587.5 437.595 712.597.5 812.5100
52
3) Grouped frequency Grouped frequency works for continuous metric data Birth weightNo. of infants born from mothers with TB 2700-29992 3000-32993 3300-35999 3600-38999 3900-41994 4200-44993 A group width of 300g The class lower limit The class upper limit
53
General rules Frequency table nominal, ordinal and discrete metric data Grouped frequency table continuous metric data
54
4) Open-ended group One or more values which are called outliers, long away from the general mass of the data Use ≤ or ≥
55
5) Cross-tabulation Two variables within a single group of individuals Pulmonary mass TB/HIV+ Totals YesNo Benign211132 Malignant448 Totals2515 40
56
Describing data….. with charts
57
3. Describing data with charts 1)Charting nominal data a)pie chart b)simple bar chart c)cluster bar chart d) stacked bar chart 2) Charting ordinal data a)pie chart b)bar chart c)dotplot 3) Charting discrete metric data 4) Charting continuous metric data histogram 5) Charting cumulative ordinal or discrete metric data step chart 6) Charting cumulative metric continuous data cumulative frequency or ogive 7) Charting time based time –series chart
58
1-a) Pie chart 4-5 categories One variable Start at 0° in the same order as the table Adverse events of ethionamide
59
1-b) Simple bar chart Same widths, equal spaces b/w bars n
60
1-c) Clustered bar chart
61
1-d) Stacked bar chart
62
2-3) Dot-plot Useful with ordinal variables if the number of categories is too large for a bar chart
63
4) Histogram Percentage of age distribution of pregnant TB women 0 5 10 15 20 25 30 35 40 <1920-2425-2930-34>35 TB cases %
64
6) Cumulative frequency curve
65
Describing data from its distributional shape
66
Symmetric mound-shaped distributions 0 5 10 15 20 25 30 35 40 > 1920-2425-2930-34> 35 Percentage of age distribution of pregnant women with TB
67
Skewed distributions 0 20 40 60 80 100 120 140 160 15- 24 25- 34 35- 44 45- 54 55- 64 65- 74 75- 84 > 85 Age distribution for migrants who develop TB
68
Bimodal distributions A bimodal distribution is one with two distinct humps
69
Normal-ness Symmetric Same mean, median, mode
70
Describing data with numeric summary value
71
1. numbers, proportions (percentages) 2. summary measures of location 3. summary measures of spread
72
Numbers and proportions Numbers actual frequencies Percentage is a proportion multiplied by 100 1)Prevalence 2) Incidence
73
Prevalence -nature relative frequency number of existing cases in some population at a given time t0t0 disease health
74
Prevalence No. of existing cases of a disease at t 0 = 0…..1 total population A (N=6)B (N=4) f a =1 No comparison f r =0.17f r =0.25 Comparison Disease Health
75
Prevalence P == 0 P == 0.25 P == 1 Disease Health
76
Prevalence Prevalence data: - Highlight the time of the evaluation Example: P (2010)= 0.17 P (2010)= 17 per 100 individuals
77
Incidence estimates the risk of developing disease t0t0 t1t1 People at risk (healthy) Disease Health
78
No. of new cases during given t 0 - t 1 total population at risk Incidence - Measures the probability or risk of developing disease during given time period - Absolute risk probabilityof developing an adverse event
79
Incidence -Assess the health status at baseline esclude prevalent cases at t 0 -Define a follow-up for the cohort Healthy people followed-up for a given time period
80
Cohort Closed Population adds no new members over time, and loses members only to disease/death Open Population may gain members over time, through immigration or birth, or lose members through emigration
81
Cumulative incidence - Closed population - Individual time period at risk same period for all the members A > B > C > D > E > t0t0 t1t1 time PeoplePeople 03
82
No. of new cases during given t 0 - t 1 total population at risk Cumulative incidence
83
Example: t 0 = 24; new cases= 3; follow-up = 3 years CI in 3 years = 0.125 new cases per 1 individual at risk enrolled at t 0 12.5 new cases in 100 individuals at risk enrolled at t 0 t0t0 t1t1 time PeoplePeople 03 Cumulative incidence
84
- Closed popularion rare - Short follow-up and enrollment of a few individuals - Open population Cumulative incidence…critical features
85
Open population -Non cases (drop-out) and cases during the follow-up - Enrollment of new individuals during the follow-up - Length of follow-up not uniform
86
A > B > D > F > H > t0t0 t1t1 time PeoplePeople G > I > Drop-out Case C > E > Open population
87
Coorte dinamica Individual time period at risk not uniform Estimate the population at risk: - Total person-time - Estimate of the total person-time
88
Coorte dinamica Total person-time individual time period at risk Person-time: days-, months-, years
89
Density of incidence No. of new cases during given t 0 - t 1 total person-time
90
1 (A)51 person x 5 years5 person-years 3 (B, C, D)23 person x 2 years6 person-years 2 (E, F)2.52 person x 2.5 years5 person-years 2 (G, H)1.52 person x 1.5 years3 person-years 1 (I)31 person x 3 years3 person-years N Individual time period at risk Person-years Total person-time22 person-years Person-years Density of incidence
91
1 new case 22 person-years 0,045 new cases = 1 person-years = 0,045 45 per 1000 person-years Density of incidence
92
Open population Estimate of the total person-time Individual time period at risk not known for all -Migration Movement of the cohort in the middle of the follow-up
93
Estimate of the total person-time (P 0 + P t )/2 x follow-up
94
At t 0 : 100 people Follow-up: 3 years New cases: 3 Drop-out: 17 Enrollment during the follow-up: 16 >>>P 0 = 100; P t = (100-3-17+16) = 96 (P 0 + P t )/2 x follow-up (100 + 96)/2 x 3 = 294 person-years Estimate of the total person-time
95
Test the estimate: 80 people x 3 years = 240 person-years Movement of the cohort (17 x 1.5) + (3 x 1.5) + (16 x 1.5) = 54 person-years 240 + 54 = 294 person-years At t 0 : 100 people Follow-up: 3 years New cases: 3 Drop-out: 17 Enrollment during the follow-up: 16 Estimate of the total person-time
96
Incidence rate 3 new cases/ 294 person-years x 1000 = 10.2 No. of new cases during given t 0 - t 1 estimate of total person-time
97
Summary measures of location 1)mode: category or value occurs the most often, typical- ness. Categorical, metric discrete 2) median: middle value in ascending order, central-ness. ordinal and metric data 3) mean (average): divide the sum of the values by the number of values 4) percentile: divide the total number of the values into 100 equal-sized groups.
98
Choosing the most appropriate measure ModeMedianMean Nominalyesno Ordinalyes no Metric discrete yesYes, when markedly skewed yes Metric continuous yesYes, when markedly skewed yes
99
Summary measure of spread Range distance from the smallest value to the largest IQR (interquartile range) spread of the middle half of the values Boxplot graphical summary of the three quartile values, the minimum and maximum values, and outliers.
100
Standard deviation Average distance of all the data values from the mean value The smaller the average distance is, the narrower the spread, and vice versa Used metric data only
101
1.Subtract the mean from each of the n value in the sample, to give the different values 2. Square each of these differences 3. Add these squared values together (sum of squares) 4. Divide the sum of squares by 1 less than the sample size. (n-1) 5. Take the square-root
102
Standard deviation and the normal distribution
103
The Basic Steps of Statistical Work 1. Design of study Professional design: Research aim Subjects, Measures, etc.
104
Statistical design: Sampling or allocation method, Sample size, Randomization, Data processing, etc.
105
2. Collection of data Source of data Government report system Registration system Routine records Ad hoc survey
106
Data collection accuracy, complete, in time Protocol: Place, subjects, timing; training; pilot; questionnaire; instruments; sampling method and sample size; budget Procedure: observation, interview filling form, letter telephone, web
107
3. Data Sorting Checking Hand, computer software Amend Missing data? Grouping According to categorical variables (sex, occupation, disease…) According to numerical variables (age, income, blood pressure …)
108
4. Data Analysis Descriptive statistics (show the sample) mean, incidence rate … -- Table and plot Inferential statistics (towards the population) -- Estimation Hypothesis test (comparison)
109
Definition of Selection Bias Selection bias: Selection biases are distortions that result from procedures used to select subjects and from factors that influence study participation. The common element of such biases is that the association between exposure and disease is different for those who participate and those who should be theoretically eligible for study, including those who do not participate.
110
Definition of Selection Bias It is sometimes (but not always) possible to disentangle the effects of participation from those of disease determinants using standard methods for the control of confounding. One example is the bias introduced by matching in case-control studies.
111
Definition of Confounding Confounding: bias in estimating an epidemiologic measure of effect resulting from an imbalance of other causes of disease in the compared groups. (mixing of effects)
112
Characteristics of a Confounder associated with disease (in non-exposed) associated with exposure (in source population) not an intermediate cause
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.