Download presentation
Presentation is loading. Please wait.
Published byPenelope Mills Modified over 8 years ago
1
Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.
2
Areas to be addressed today n Introduction to variables and data n Simple linear regression n Correlation n Population covariance n Multiple regression n Canonical correlation n Discriminant analysis n Logistic regression n Survival analysis n Principal component analysis n Factor analysis n Cluster analysis
3
Types of variables (Stevens’ classification, 1951) n Nominal u distinct categories: race, religions, counties, sex n Ordinal u rankings: education, health status, smoking levels n Interval u equal differences between levels: time, temperature, glucose blood levels n Ratio u interval with natural zero: bone density, weight, height
4
Variables use in data analysis n Dependent: result, outcome u developing CHD n Independent: explanatory u Age, sex, diet, exercise n Latent constructs u SES, satisfaction, health status n Measurable indicators u education, employment, revisit, miles walked
5
Variables in data example
6
Data n Data screening and transformation n Normality n Independence n Correlation (or lack of independence)
7
Variable types and measures of central tendency n Nominal: mode n Ordinal: median n Interval: Mean n Ratio: Geometric mean and harmonic mean
8
Simple linear regression X Y A B Y = A + BX
9
Correlation n Mean = n Variance (SD) 2 = n Population covariance = (X- x)(Y- y) n Product moment coefficient= = xy / x y n It lies between -1 and 1
10
Example physical and mental health indicators
11
Negative correlation
12
Population covariance =0.00 =0.33 =0.6 =0.88
13
Multiple regression and correlation Simple linear Y = + X Multiple regression Y = + 1 X 1 + 2 X 2 + 3 X 3...+ p X p EF ejection fraction Body fat Exercise
14
Issues with regression n Missing values u random u pattern u mean substitution and ML n Dummy variables u equal intervals! n Multicollinearity u independent variables are highly correlated n Garbage can method
15
Canonical correlation n An extension of multiple regression n Multiple Y variables and multiple X variables n Finding several linear combinations of the X var and the same number of linear combinations of the Y var. n These combinations are called canonical variables and the correlations between the corresponding pairs of canonical variables are called CANONICAL CORRELATIONS
16
Correlation matrix n Data screening and transformation n Normality n Independence n Correlation (or lack of independence)
17
Discriminant analysis n A method used to classify an individual in one of two or more groups based on a set of measurements n Examples: u at risk for F heart disease F cancer F diabetes, etc. n It can be used for prediction and description
18
Discriminant analysis n a and b are wrongly classified n discriminant function to describe the probability of being classified in the right group. a b A B B A
19
Logistic regression n An alternative to discriminant analysis to classify an individual in one of two populations based on a set of criteria. n It is appropriate for any combination of discrete or continuous variables n It uses the maximum likelihood estimation to classify individuals based on the independent variable list.
20
Survival analysis (event history analysis) n Analyze the length of time it takes a specific event to occur. n Time for death, organ failure, retirement, etc. n Length of time function of {explanatory variables (covariates)}
21
Survival data example 1980 19851990 died lost surviving
22
Log-linear regression n A regression model in which the dependent variable is the log of survival time (t) and the independent variables are the explanatory variables. Multiple regression Y = + 1 X 1 + 2 X 2 + 3 X 3...+ p X p Log (t) = + 1 X 1 + 2 X 2 + 3 X 3...+ p X p + e
23
Cox proportional hazards model n Another method to model the relationship between survival time and a set of explanatory variables. n Proportion of the population who die up to time (t) is the lined area 198019851990 t
24
n The hazard function (h) at time (t) is proportional among groups 1 & 2 so that n h1(t1)/h2(t2) is constant. Cox proportional hazards model
25
Principal component analysis n Aimed at simplifying the description of a set of interrelated variables. n All variables are treated equally. n You end up with uncorrelated new variables called principal components. n Each one is a linear combination of the original variables. n The measure of the information conveyed by each is the variance. n The PC are arranged in descending order of the variance explained.
26
n A general rule is to select PC explaining at least 5% but you can go higher for parsimony purposes. n Theory should guide this selection of cutoff point. n Sometimes it is used to alleviate multicollinearity. Principal component analysis
27
Factor analysis n The objective is to understand the underlying structure explaining the relationship among the original variables. n We use the factor loading of each of the variables on the factors generated to determine the usability of a certain variable. n It is guided again by theory as to what are the structures depicted by the common factors encompassing the selected variables.
28
Factor analysis
30
Cluster analysis n A classification method for individuals into previously unknown groups n It proceeds from the most general to the most specific: n Kingdom: Animalia Phylum: Chordata Subphylum: vertebrata Class: mammalia Order: primates Family: hominidae Genus: homo Species: sapiens
31
Patient clustering n Major: patients Types: medical Subtype: neurological Class: genetic Order: lateonset disease: Guillian Barre syndrom n Hierarchical: divisive or agglumerative
32
Conclusions
33
Presentation Schedule n 4 each on 4/22 and 4/27 n 5 on 4/29 n Each presentation should be maximum of 10 minutes and 5 minutes for discussion n E-mail me your requirements of software and hardware for your presentation. n Final projects due 5/7/99 by 5:00 pm in my office.
34
Presentation Schedule 1
35
Presentation Schedule 2
36
Presentation Schedule 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.