Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Medical Statistics Sun Jing Health Statistics Department.

Similar presentations


Presentation on theme: "Introduction to Medical Statistics Sun Jing Health Statistics Department."— Presentation transcript:

1 Introduction to Medical Statistics Sun Jing Health Statistics Department

2 Contents  Introduction basic concepts in Medical Statistics  Small game to practice basic concepts ★

3 Vocabulary Medical Statistics 医学统计学

4 Statistics  The discipline concerned with the treatment of numerical data derived from groups of individuals (P. Armitage).  The science and art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results ( JM Last).

5 Medical Statistics  Application of mathematical statistics in the field of medicine

6 Homogeneity: All individuals have similar values or belong to same category. Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation. Homogeneity and Variation

7 Variation: the differences in height, weight… Toss a coin: The mark face may be up or down ---- variation! Treat the patients suffering from pneumonia with same antibiotics: A part of them recovered and others didn’t ---- variation! If there is no variation, there is no need for statistics. Can you give an example of variation in medical field? Homogeneity and Variation

8 Population: The whole collection of individuals that one intends to study. Sample: A representative part of the population. Population and sample

9 Questions: Which one is “population”? All the cases with hepatitis B collected in a hospital in Guangzhou. All the deaths found from the permanent residents in a city. All the rats for testing the toxicity of a medicine.

10 Randomization : An important way to make the sample representative. Randomization

11 Probability Measure the possibility of occurrence of a random event. A : random event P(A) : Probability of the random event A P(A)=1, if an event always occurs. P(A)=0, if an event never occurs.

12 Random By chance! Random event: the event may occur or may not occur in one experiment. Before one experiment, nobody is sure whether the event occurs or not. Question: Please give some examples of random event. There must be some regulation in a large number of experiments.

13 Number of observations: n (large enough) Number of occurrences of random event A: m P(A)  m/n (Frequency or Relative frequency) Question: Please give some examples for probability of a random event, and frequency of that random event Estimation of Probability----Frequency

14 Parameter and statistic Parameter : A measure of population or A measure of the distribution of population. Parameter is usually presented by Greek letter such as μ,π. -- Parameters are unknown usually

15 To know the parameter of a population, we need a sample Statistic: A measure of sample or A measure of the distribution of sample. Statistic is usually presented by Latin letter such as s and p. Questions: Please give an example for parameter and statistics. Does a parameter vary? Does a statistic vary?

16 5. Sampling Error The difference between observed value and true value. Three kinds of error: (1) Systematic error (fixed) (2) Measurement error (random) (3) Sampling error (random)

17 Sampling error The statistics of different samples from same population: different each other! The statistics: different from the parameter! The sampling error exists in any sampling research. It can not be avoided but may be estimated.

18 8.2 Types of data 1. Numerical Variable and Measurement Data The variable describe the characteristic of individuals quantitatively -- Numerical Variable The data of numerical variable -- Measurement Data

19 2. Categorical Variable and Enumeration Data The variable describe the category of individuals according to a characteristic of individuals -- Categorical Variable The number of individuals in each category -- Enumeration Data

20 Special case of categorical variable : Ordinal variable and rank data There exists order among all possible categories -- Ordinal variable The data of ordinal variable, which represent the order of individuals only -- Rank data

21 Examples Which type of variables they belong to? RBC (4.58 10 6 /l) Diastolic/systolic blood pressure (8/12 kappa) Percentage of individuals with blood type A (20%) Protein in urine (++) Transition rate of cell ( 90%)

22 2. Measures for Average (1) Arithmetic Meathe : it is calculated by summering all the observations in a set of data and dividing by the total number of measurements. Based on observed data Example: Blood sugar 6.2, 5.4, 5.7, 5.3, 6.1, 6.0, 5.8, 5.9

23 (2) Geometric mean Example 9-4 See Table 9-4

24 (3) Median Ranking the values of observation from the smallest to the largest, Median = the value in the middle. It is also called 50 th percentile: half the values would be greater than it and the other half would be less than it

25 Based on raw data (3) Median Based on raw data Example 1: (7 values) 120,123,125,127,128,130,132 Median =127 Example 2: (8 values) 118,120,123,125,127,128,130,132 Median=(125+127)/2=126

26 3. Measures for variability (1)Range Range= Maximum - Minimum Based on only two observations, it ignores the observations within the two extremes. The greater the number of observations, the greater the range is.

27 (2) Inter- quartile range Lower Quartile: 25 percentile Upper Quartile: 75 percentile Difference between two Quartiles = Upper Quartile - Lower Quartile = 13.120 – 8.083 = 5.037

28 (3)Variance and Standard Deviation variance is calculated by subtracting the mean of a set of data value from each of the observations, squaring these deviations, adding them up, and dividing by one less than the number of the observations in the data set. The mean of squared deviation

29 (3)Variance and Standard Deviation Standard deviation (SD): is the square root of the variance.

30 (4)Coefficient of Variation CV: is a ratio of standard deviation to arithmetic mean multiplied by 100. Example 9-10 Variation of height and variation of weight

31 Absolute measure: The numbers counted for each category (frequencies) The absolute measure can hardly be used for comparison between different populations.

32 Relative measure Three kinds of relative measures: Frequency (Proportion) Intensity (Rate) Ratio

33 Relative Frequency It is proportion or Relative frequency!

34 Eg3.1 In an alcohol drinking survey with sample size 2327 aged between 15 and 65, it was found that there were 347 were alcohol abusers, estimate the relative frequency of alcohol abuse. According to the formula, the relative frequency of alcohol abuse =347/2327×100%=14.9%.

35 Proportion ( constitute rate ): A part considered in relation to the whole. Eg, proportion of sex proportion of age proportion of mortality of diseases

36 DiseaseMortalityProportion (%) Malignant tumor5033.33 Circulation system4026.67 Respiration system3020.00 Digestive system2013.33 Infectious disease10 6.67 Total150100.00 Table 3.1 proportions of 5 disease death in 2001

37 Example 1 Question: Which grade has the most serious condition of myopias?

38 Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade) Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia) Which grade has the most serious condition of myopias? Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.

39 (2) Intensity Example A smoking population had followed up for 562833 person-years, 346 lung cancer cases were found. The incidence rate of lung cancer in the smoking population is : The incidence rate of lung cancer in the smoking population is : Incidence rate =346/562833 Incidence rate =346/562833 =61.47 per 100,000 person-year =61.47 per 100,000 person-year

40 In general, Denominator: Sum of the person-years observed in the period Numerator: Total number of the event appearing in the period Unit: person/person year, or 1/Year Nature: the relative frequency per unit of time.

41 Eg3.2 In an infection survey, the researchers observed 500 patients in a hospital, the total number of observed days is 12500 (person-day). They were found that 59 patients were infected in the hospital. Calculate the daily infection rate in the hospital. According to the formula, the daily infection rate = 59/12500 = 0.00472 = 0.472% , that is, there are 0.472% of patients may be infected every day in this hospital.

42 Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.

43 (3) Ratio Ratio is a number divided by another related number Examples Sex ratio of students in this class: No. of males : No. of females = 52% Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40

44 Ratio : It is quotient of any two values. It represents the times of one to another. Eg, ratio of sex ratio of sickbed of two hospital relative risk

45 Eg3.3 There are 14750 doctors in a city with population of 8100000 in 2000. Find out the possession of doctors per 1000 person in this city. According to the formula, the possession of doctors per 1000 person = 14750/8100 = 1.82, that is, it is hold about 1.82 doctors per 1000 people.

46 13.1 Principles of research design 1. Control 2. Balance 3. Randomization 4. Replication

47 2. Balance: The experimental group and control group are almost the same in all aspects except the treatment. Others Effect of others Effect of treatmentTreatment Subject Others ControlEffect of control Subject Effect of others

48 3. Randomization Many factors, we know that they may influence the results, but they are very difficult to deal with – Randomization is the best choice! Example To improve the homogeneity of subjects, collect a number of students with same age and gender; randomly arrange them into two groups to make them balance in height and weight.

49 Randomization is the prerequisite of statistical inference. Randomization  Casual Randomization means that all subjects in population have same probability to be sampled out for research.

50 4. Replication One meaning of replication : The results can be reproduced in different labs and by different researchers. Another meaning of replication : The study should be performed in a big enough sample. Altman & Dore checked 90 papers: 39% mentioned their sample size and why. Sample sizes of 27% papers were too small to make a conclusion.

51 Experimental design 1.Why? To plan and arrange subject selection, treatment assignment, data collection and statistical analysis To make sure validity, reproducible and economy. 2. Types of research Experiment: animal experiment, clinical trial, community intervention trial Survey Both need well design !

52 Survey design 1. Survey Observe the existing process Without intervention Well design Example for surveys: Health condition survey Epidemiological survey Etiologic survey Clinical follow up survey Sanitary survey …….

53 regression coefficient regression coefficient: measures the quantitative dependency relationship of the variable Y on X.

54 correlation coefficient correlation coefficient (also called coefficient of product-moment correlation) measures the strength and direction of the linear relationship between the two variables. “regression ” has became the statistic term which show the quantitative dependency between the variables, and formed some new statistic concepts such as the “regression equation” and “regression coefficient”.


Download ppt "Introduction to Medical Statistics Sun Jing Health Statistics Department."

Similar presentations


Ads by Google