Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

StatisticalDesign&ModelsValidation. Introduction.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Logistic Regression.
Statistical Tests Karen H. Hagglund, M.S.
Chapter 17 Overview of Multivariate Analysis Methods
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
The Simple Regression Model
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Topic 3: Regression.
Basic Statistics for Research: Choosing Appropriate Analyses and Using SPSS Dr. Beth A. Bailey Dr. Tiejian Wu Department of Family Medicine.
Elaboration Elaboration extends our knowledge about an association to see if it continues or changes under different situations, that is, when you introduce.
Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Simple Linear Regression
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Data analysis is largely a search for patterns – that is, for meaningful relations among various items observed - K. Godfrey.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
Investigating the Relationship between Scores
Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Average Arithmetic and Average Quadratic Deviation.
Chapter 16 Data Analysis: Testing for Associations.
ANALYSIS PLAN: STATISTICAL PROCEDURES
Multivariate Data Analysis Chapter 1 - Introduction.
Sample size and common statistical tests There are three kinds of lies- lies, dammed lies and statistics…… Benjamin Disraeli.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Authenticity of results of statistical research. The Normal Distribution n Mean = median = mode n Skew is zero n 68% of values fall between 1 SD n 95%
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
2 NURS/HSCI 597 NURSING RESEARCH & DATA ANALYSIS GEORGE MASON UNIVERSITY.
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Carolinas Medical Center, Charlotte, NC Website:
BINARY LOGISTIC REGRESSION
Statistical Analysis Urmia University
Correlation – Regression
Basic Statistics Overview
Introduction to logistic regression a.k.a. Varbrul
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Stats Club Marnie Brennan
NURS 790: Methods for Research and Evidence Based Practice
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.

Areas to be addressed today n Introduction to variables and data n Simple linear regression n Correlation n Population covariance n Multiple regression n Canonical correlation n Discriminant analysis n Logistic regression n Survival analysis n Principal component analysis n Factor analysis n Cluster analysis

Types of variables (Stevens’ classification, 1951) n Nominal u distinct categories: race, religions, counties, sex n Ordinal u rankings: education, health status, smoking levels n Interval u equal differences between levels: time, temperature, glucose blood levels n Ratio u interval with natural zero: bone density, weight, height

Variables use in data analysis n Dependent: result, outcome u developing CHD n Independent: explanatory u Age, sex, diet, exercise n Latent constructs u SES, satisfaction, health status n Measurable indicators u education, employment, revisit, miles walked

Variables in data example

Data n Data screening and transformation n Normality n Independence n Correlation (or lack of independence)

Variable types and measures of central tendency n Nominal: mode n Ordinal: median n Interval: Mean n Ratio: Geometric mean and harmonic mean

Simple linear regression X Y A B Y = A + BX

Correlation n Mean =  n Variance (SD) 2 =  n Population covariance = (X-  x)(Y-  y) n Product moment coefficient=  =  xy /  x  y n It lies between -1 and 1

Example physical and mental health indicators

Negative correlation

Population covariance  =0.00  =0.33  =0.6  =0.88

Multiple regression and correlation Simple linear Y =  +  X Multiple regression Y =  +  1 X 1 +  2 X 2 +  3 X  p X p EF ejection fraction Body fat Exercise

Issues with regression n Missing values u random u pattern u mean substitution and ML n Dummy variables u equal intervals! n Multicollinearity u independent variables are highly correlated n Garbage can method

Canonical correlation n An extension of multiple regression n Multiple Y variables and multiple X variables n Finding several linear combinations of the X var and the same number of linear combinations of the Y var. n These combinations are called canonical variables and the correlations between the corresponding pairs of canonical variables are called CANONICAL CORRELATIONS

Correlation matrix n Data screening and transformation n Normality n Independence n Correlation (or lack of independence)

Discriminant analysis n A method used to classify an individual in one of two or more groups based on a set of measurements n Examples: u at risk for F heart disease F cancer F diabetes, etc. n It can be used for prediction and description

Discriminant analysis n a and b are wrongly classified n discriminant function to describe the probability of being classified in the right group. a b A B B A

Logistic regression n An alternative to discriminant analysis to classify an individual in one of two populations based on a set of criteria. n It is appropriate for any combination of discrete or continuous variables n It uses the maximum likelihood estimation to classify individuals based on the independent variable list.

Survival analysis (event history analysis) n Analyze the length of time it takes a specific event to occur. n Time for death, organ failure, retirement, etc. n Length of time function of {explanatory variables (covariates)}

Survival data example died lost surviving

Log-linear regression n A regression model in which the dependent variable is the log of survival time (t) and the independent variables are the explanatory variables. Multiple regression Y =  +  1 X 1 +  2 X 2 +  3 X  p X p Log (t) =  +  1 X 1 +  2 X 2 +  3 X  p X p + e

Cox proportional hazards model n Another method to model the relationship between survival time and a set of explanatory variables. n Proportion of the population who die up to time (t) is the lined area t

n The hazard function (h) at time (t) is proportional among groups 1 & 2 so that n h1(t1)/h2(t2) is constant. Cox proportional hazards model

Principal component analysis n Aimed at simplifying the description of a set of interrelated variables. n All variables are treated equally. n You end up with uncorrelated new variables called principal components. n Each one is a linear combination of the original variables. n The measure of the information conveyed by each is the variance. n The PC are arranged in descending order of the variance explained.

n A general rule is to select PC explaining at least 5% but you can go higher for parsimony purposes. n Theory should guide this selection of cutoff point. n Sometimes it is used to alleviate multicollinearity. Principal component analysis

Factor analysis n The objective is to understand the underlying structure explaining the relationship among the original variables. n We use the factor loading of each of the variables on the factors generated to determine the usability of a certain variable. n It is guided again by theory as to what are the structures depicted by the common factors encompassing the selected variables.

Factor analysis

Cluster analysis n A classification method for individuals into previously unknown groups n It proceeds from the most general to the most specific: n Kingdom: Animalia Phylum: Chordata Subphylum: vertebrata Class: mammalia Order: primates Family: hominidae Genus: homo Species: sapiens

Patient clustering n Major: patients Types: medical Subtype: neurological Class: genetic Order: lateonset disease: Guillian Barre syndrom n Hierarchical: divisive or agglumerative

Conclusions

Presentation Schedule n 4 each on 4/22 and 4/27 n 5 on 4/29 n Each presentation should be maximum of 10 minutes and 5 minutes for discussion n me your requirements of software and hardware for your presentation. n Final projects due 5/7/99 by 5:00 pm in my office.

Presentation Schedule 1

Presentation Schedule 2

Presentation Schedule 3