Download presentation
Presentation is loading. Please wait.
Published byEthelbert Scott Modified over 8 years ago
1
Chapter 37 重回帰,ロジスティック回帰, 比例ハザード回帰 Multiple, Logistic, and Proportional Hazard Regression 2013.7.31
2
1. 重回帰法の種類 Different kind of multiple regression ・ Regression methods listed above (plus some more) are special cases of the generalized linear model (GLM) 一般化線形モデル ・ dummy variable (indicator variable) ダミー変数(指示変数) e.g., male=0, female=1 ・ Multivariate and multivariable 多変量と多変数 TYPE OF REGRESSIONTYPE OF DEPENDENT (Y) VARIABLEEXAMPLE Y VARIABLES linearContinuous (interval or ratio)Enzyme activity Renal function (creatinine clearance) Weight LogisticBinary or dichotomousDeath during surgery Graduation Recurrence of cancer Polytomous ( 多値ロジスティッ ク ) Discrete variable with more than two outcomes multinom() function of nnet package Proportional hazardsElapsed time to a one·time event ( 一時的な事象が生じる経過時間 ) Months until death Days until patient is weaned from ventilator Quarters in school before graduation Anderson-GillElapsed time to an event that can recur ( 再帰しうるまでの経過時間 ) Months until next seizure Days until next occurrence of atrial fibrillation PoissonNumber of events in a time period ( 一定時間における事象数 ) Number of hospitalizations Number of falls
3
Multiple linear regression is an extension of linear regression. Multiple linear regression fits the model to the data to find the values for the coefficients that make the model come as close as possible to predicting the actual data. ・ Mathematical model 2. 多重線形回帰 Multiple linear regression (1) formulaModel y ~ xy = a + bx + ε y ~ x1 + x2y = a + b 1 x 1 + b 2 x 2 + ε y ~ x1 + x2 + x1*x2 y ~ x1 * x2 y ~ (x1 + x2)^2 y = a + b 1 x 1 + b 2 x 2 + b 3 x 1 x 2 + ε (x 1 x 2 : Interaction between x1 and x2) y ~ x - 1y = bx + ε (+ 0 is the same) y ~ 1 + x + I(x^2) y = b 0 + b 1 x 1 + b 2 x 2 + ε . (I(x^2) =poly(x,2)) #R obj <- lm(formula,data) http://cse.naro.affrc.go.jp/takezawa/r-tips/r/71.html
4
Multiple linear regression model do not distinguish between the X variable(s) you are really care about and the other X variable(s) that you are adjusting for (called covariates). You make that distinction when interpreting the results. Chapter 38 explains a problem with variable selection methods: It is possible to overfit the data. 3. 多重線形回帰 Multiple linear regression (2) FunctionExplanation AIC(obj)Calculate AIC (Akaike's Information Criterion) coefficients(obj)Extract coefficients (=coef(obj)) deviance(obj)Residual sum of squares, sum(residuals(obj)^2) formula(obj)Extract a model predict(obj, newdata=data.frame)Predicted values based on linear model object residuals(obj)Extract residuals (=resid(obj)) step(obj)Stepwise variable selection summary(obj)Output a summary of regression analysis http://cse.naro.affrc.go.jp/takezawa/r-tips/r/71.html
5
4. 多重ロジスティック回帰 Multiple logistic regression (1) Logistic regression is used when there are two possible outcomes. ・ Mathematical model Logistic regression computes an odds ratio for each independent variable along with 95% CI. R : glm(formula,data,family=binomial) x<-c(24,18,15,16,10,26,2,24,18,22,3,6,15,12,6,6,12,12,18,3,8,9,12,6,8,12) y<-c(1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) result<-glm(y~x,family=binomial) # CI from exp(coef +- 1.96 * se) se <- summary(result)$coefficients[, "Std. Error"] z <- qnorm(0.975) upper <- exp(coef(result) + z * se) lower <- exp(coef(result) - z * se) cbind(lower, upper) lower upper (Intercept) 0.009506751 0.7072379 x 1.019217257 1.3837280 exp(confint(result)) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 0.006570897 0.560349 x 1.036782016 1.422514 exp(stats:::confint.default(result)) exp(MASS:::confint.glm(result)) profile likelihood confidence interval
6
5. 多重ロジスティック回帰 Multiple logistic regression (2) OR 2 (age) = 1.02 1.02 10 = 1.22 22% increase / 10 years Prediction of obesity
7
6. 多重比例ハザード回帰 Multiple proportional hazard regression Proportional hazard regression is used when the outcome is elapsed time to an event, often used for analyses of survival time. ・ Mathematical model ・ Hazard ratio ( 比例ハザード ) A survival plots cumulative survival as a function of time. The slope of the survival curve is the rate of dying in a short time interval. This is termed the hazard. When comparing two groups, investigators often assume that the ratio of hazard function is constant over time. 生存曲線は時間の関数として累積生存を描く。生存曲線の傾きは死亡率( = hazard )。 2 群の hazard の比。時間と共にハザード比が一定と仮定する。 #R library(survival) data(kidney) kidney.cox<-coxph( Surv(time, status) ~ sex+disease, data=kidney) summary(kidney.cox) kidney.fit<-survfit(kidney.cox) plot(kidney.fit)
8
7. 前提 Assumptions ・ Sampling from a population 母集団からのサンプリング(すべての統計学的解析に共通する前提) ・ Linear effect with no interaction beyond what is specified in the model. モデルで特定される以外の交互作用を示さない線形効果 交互作用がある場合、モデルを拡張する ・ Independent observations それぞれの対象におけるデータは変数間のつながりについて独立した情報を与え る。 対象のいく人かが双生児、兄弟であれば、この前提は成立しない。 ・ Random component of model is correct. モデルのランダム要因が正しい。 重回帰はモデル → Gauss 分布(或いは、近似する) ロジスティック回帰 → 2項分布 ・ Additional assumptions of proportional hazards regression See Chapter 5 and 29
9
8. 相関する観察 Correlated observations Non-independent (correlated) observations are common. One of the assumption of multiple regression is that each observation is independent. This assumption is violated in many experimental designs ・ Longitudinal studies 縦断研究 同じ対象におけることなる時点での複数の観察 ・ Crossover studies 異なる治療後の同じ対象における複数の観察 ・ Multiple observations on each individual 関節炎における両膝からの測定 ・ Cluster 1つの群(家族、病院、、、)から集められる対象と、別の群から集められる対 象が存在。 ・ Case-control (Chapter 28) 対応したペアとして対象が集められる。 ・ Meta-analyses どうデータを併合するか。 上記は全て有用な実験デザインであるが、データ解析を複雑なものにする。 Simple alternatives are often adequate E.g., 異なる施設で臨床研究が行われる場合、施設毎に解析。
10
Fancier methods are sometimes needed for correlated observations The methods used to properly analyze correlated data are far beyond the scope of this book. Here is a list of some of the methods. ・ Generalized estimating equation (GEE) 一般化推定式 ・ Mixed effects model, also called random effect models 混合効果(ランダム効果)モデル ・ Conditional logistic, or proportional hazards regression 条件付ロジスティック、条件付比例ハザードモデル ・ Repeat-measures ANOVA, or analysis of covariance (ANCOVA) 反復測定分散分析(共分散分析) ・ Hierarchical or multilevel regression models 階層的回帰モデル 9. 相関するデータの適切な解析法
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.