Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.

Similar presentations


Presentation on theme: "Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim."— Presentation transcript:

1 Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.

2 Areas to be addressed today n Introduction to variables and data n Simple linear regression n Correlation n Population covariance n Multiple regression n Canonical correlation n Discriminant analysis n Logistic regression n Survival analysis n Principal component analysis n Factor analysis n Cluster analysis

3 Types of variables (Stevens’ classification, 1951) n Nominal u distinct categories: race, religions, counties, sex n Ordinal u rankings: education, health status, smoking levels n Interval u equal differences between levels: time, temperature, glucose blood levels n Ratio u interval with natural zero: bone density, weight, height

4 Variables use in data analysis n Dependent: result, outcome u developing CHD n Independent: explanatory u Age, sex, diet, exercise n Latent constructs u SES, satisfaction, health status n Measurable indicators u education, employment, revisit, miles walked

5 Variables in data example

6 Data n Data screening and transformation n Normality n Independence n Correlation (or lack of independence)

7 Variable types and measures of central tendency n Nominal: mode n Ordinal: median n Interval: Mean n Ratio: Geometric mean and harmonic mean

8 Simple linear regression X Y A B Y = A + BX

9 Correlation n Mean =  n Variance (SD) 2 =  n Population covariance = (X-  x)(Y-  y) n Product moment coefficient=  =  xy /  x  y n It lies between -1 and 1

10 Example physical and mental health indicators

11 Negative correlation

12 Population covariance  =0.00  =0.33  =0.6  =0.88

13 Multiple regression and correlation Simple linear Y =  +  X Multiple regression Y =  +  1 X 1 +  2 X 2 +  3 X 3...+  p X p EF ejection fraction Body fat Exercise

14 Issues with regression n Missing values u random u pattern u mean substitution and ML n Dummy variables u equal intervals! n Multicollinearity u independent variables are highly correlated n Garbage can method

15 Canonical correlation n An extension of multiple regression n Multiple Y variables and multiple X variables n Finding several linear combinations of the X var and the same number of linear combinations of the Y var. n These combinations are called canonical variables and the correlations between the corresponding pairs of canonical variables are called CANONICAL CORRELATIONS

16 Correlation matrix n Data screening and transformation n Normality n Independence n Correlation (or lack of independence)

17 Discriminant analysis n A method used to classify an individual in one of two or more groups based on a set of measurements n Examples: u at risk for F heart disease F cancer F diabetes, etc. n It can be used for prediction and description

18 Discriminant analysis n a and b are wrongly classified n discriminant function to describe the probability of being classified in the right group. a b A B B A

19 Logistic regression n An alternative to discriminant analysis to classify an individual in one of two populations based on a set of criteria. n It is appropriate for any combination of discrete or continuous variables n It uses the maximum likelihood estimation to classify individuals based on the independent variable list.

20 Survival analysis (event history analysis) n Analyze the length of time it takes a specific event to occur. n Time for death, organ failure, retirement, etc. n Length of time function of {explanatory variables (covariates)}

21 Survival data example 1980 19851990 died lost surviving

22 Log-linear regression n A regression model in which the dependent variable is the log of survival time (t) and the independent variables are the explanatory variables. Multiple regression Y =  +  1 X 1 +  2 X 2 +  3 X 3...+  p X p Log (t) =  +  1 X 1 +  2 X 2 +  3 X 3...+  p X p + e

23 Cox proportional hazards model n Another method to model the relationship between survival time and a set of explanatory variables. n Proportion of the population who die up to time (t) is the lined area 198019851990 t

24 n The hazard function (h) at time (t) is proportional among groups 1 & 2 so that n h1(t1)/h2(t2) is constant. Cox proportional hazards model

25 Principal component analysis n Aimed at simplifying the description of a set of interrelated variables. n All variables are treated equally. n You end up with uncorrelated new variables called principal components. n Each one is a linear combination of the original variables. n The measure of the information conveyed by each is the variance. n The PC are arranged in descending order of the variance explained.

26 n A general rule is to select PC explaining at least 5% but you can go higher for parsimony purposes. n Theory should guide this selection of cutoff point. n Sometimes it is used to alleviate multicollinearity. Principal component analysis

27 Factor analysis n The objective is to understand the underlying structure explaining the relationship among the original variables. n We use the factor loading of each of the variables on the factors generated to determine the usability of a certain variable. n It is guided again by theory as to what are the structures depicted by the common factors encompassing the selected variables.

28 Factor analysis

29

30 Cluster analysis n A classification method for individuals into previously unknown groups n It proceeds from the most general to the most specific: n Kingdom: Animalia Phylum: Chordata Subphylum: vertebrata Class: mammalia Order: primates Family: hominidae Genus: homo Species: sapiens

31 Patient clustering n Major: patients Types: medical Subtype: neurological Class: genetic Order: lateonset disease: Guillian Barre syndrom n Hierarchical: divisive or agglumerative

32 Conclusions

33 Presentation Schedule n 4 each on 4/22 and 4/27 n 5 on 4/29 n Each presentation should be maximum of 10 minutes and 5 minutes for discussion n E-mail me your requirements of software and hardware for your presentation. n Final projects due 5/7/99 by 5:00 pm in my office.

34 Presentation Schedule 1

35 Presentation Schedule 2

36 Presentation Schedule 3


Download ppt "Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim."

Similar presentations


Ads by Google