Chapter 3 Descriptive Statistics for Qualitative Data
Categorical data Binary Multiple categories Gender 1,2 (m,f) Blood Type of ABO A,B,AB,O Degree of protein found in urine - +, +, ++,+++ Type of blood pressure Low, normal, high Ordinal Nominal review
Statistical Description for enumeration data
Absolute measure: The numbers counted for each category (frequencies) The absolute measure can hardly be used for comparison between different populations.
Relative measure Three kinds of relative measures: Proportion Intensity (Rate) Ratio
Proportion ( 构成比 ) : A part considered in relation to the whole. Eg, proportion of sex proportion of age proportion of mortality of diseases
DiseaseMortalityProportion (%) Malignant tumor Circulation system Respiration system Digestive system Infectious disease Total Table 3.1 proportions of 5 disease death in 2001
Example 1 Question: Which grade has the most serious condition of myopias?
(2) Intensity Example A smoking population had followed up for person-years, 346 lung cancer cases were found. The incidence rate of lung cancer in the smoking population is : The incidence rate of lung cancer in the smoking population is : Incidence rate =346/ Incidence rate =346/ =61.47 per 100,000 person-year =61.47 per 100,000 person-year
In general, Denominator: Sum of the person-years observed in the period Numerator: Total number of the event appearing in the period Unit: person/person year, or 1/Year Nature: the relative frequency per unit of time.
Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.
(3) Ratio Ratio is a number divided by another related number Examples Sex ratio of students in this class: No. of males : No. of females = 52% Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40
2. Caution in use of relative measures a.The denominator should be big enough! Otherwise the absolute measure should be used. Example: Out of 5 cases, 3 were cured– 60% ? b. Attention to the population where the relative measure comes from. Prevalence rate: Population is the students in the same grade Constitutes: Population is all the patients
The above two frequency distributions reflect two populations of all patients; To describe the prevalence rate, one has to look at the general population;
c. Pooled estimate of the frequency Pooled estimate = numerators / denominators Example: The prevalence of myopia among 3 grades ≠ ( )/3 The prevalence of myopia among 3 grades = ( )/( ) = 192/1175 = d. Comparability between frequencies or between frequency distributions – Notice the balance of other conditions
e. If the distributions of other variables are different, to improve the comparability, “Standardization” is needed. f. To compare two samples, hypothesis test is needed. (See Chi square test) The following will emphasize the above two points: Standardization Hypothesis test
3. Standardization for crude frequency or crude intensity 3. Standardization for crude frequency or crude intensity Crude incidence rate of city A=28.96; Crude incidence rate of city B= Strange!? They are not comparable ! -- Because the constitute are quite different Table 10-3 Incidence rates of infectious diseases, children of two cities
Standardized incidence rate of city A = 793/24767 = ‰ Standardized incidence rate of city B = 3523/24767 = ‰ Two steps: Select a standard population– taking as “weight” Weighted average of the actual incidence rates–direct standardization rate
Known: Age specific populations N i1, N i2 ; Total no.of deaths D i1 =432, D i2 =210 Select a set of standard mortality rates Standard mortality ratio: SMR 1 = D i1 / N i1 P i = 432/ = (smoker) SMR 2 = D i2 / N i2 P i = 210/ = (non-smoker) Standardized mortality rate P ’ 1 =34.60 SMR 1 = (1/10 5 ), P ’ 2 =34.60 SMR 2 =29.83 (1/10 5 )
Table The total number of patients between
a 0 indicator for base year a n indicator for n st year Average speed of growth=average speed of development 1