Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot.
Model assessment and cross-validation - overview
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple Regression Predicting a response with multiple explanatory variables.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Relationships Among Variables
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Objectives of Multiple Regression
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Simple Linear Regression
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Model Building III – Remedial Measures KNNL – Chapter 11.
Understanding Statistics
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Linear Models Alan Lee Sample presentation for STATS 760.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 13a, April 28, 2015 Mixed Models: Optimizing, Iterating and beyond PCA.
Tutorial I: Missing Value Analysis
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
Stats Methods at IC Lecture 3: Regression.
Estimating standard error using bootstrap
Transforming the data Modified from:
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Chapter 12 Simple Linear Regression and Correlation
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Mixed models and their uses in meta-analysis
Essentials of Modern Business Statistics (7e)
John Loucks St. Edward’s University . SLIDES . BY.
Hierarchical Linear Models, Optimizing, Iterating ctd.
12 Inferential Analysis.
Checking Regression Model Assumptions
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Checking Regression Model Assumptions
Chapter 12 Simple Linear Regression and Correlation
What is Regression Analysis?
Multivariate Statistics
12 Inferential Analysis.
Product moment correlation
Diagnostics and Remedial Measures
Andrea Friese, Silvia Artuso, Danai Laina
Generalized Additive Model
Presentation transcript:

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Ctd. Local Linear Models, LDA (cf. PCA,FA), Mixed Models: Optimizing, Iterating Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 4 Module 13, April 9, 2018

Smoothing/ local … https://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/modreg/html/00Index.html http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf

Classes of local regression Locally (weighted) scatterplot smoothing LOESS LOWESS Fitting is done locally - the fit at point x, the fit is made using points in a neighborhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance)

Classes of local regression The size of the neighborhood is controlled by α (set by span). For α < 1, the neighbourhood includes proportion α of the points, and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). For α > 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables.

Classes of local regression For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least-squares fit, this need not be a very resistant fit. It can be important to tune the control list to achieve acceptable speed.

Friedman (supsmu in modreg) is a running lines smoother which chooses between three spans for the lines. The running lines smoothers are symmetric, with k/2 data points each side of the predicted point, and values of k as 0.5 * n, 0.2 * n and 0.05 * n, where n is the number of data points. If span is specified, a single smoother with span span * n is used. https://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/modreg/html/supsmu.html

Friedman The best of the three smoothers is chosen by cross-validation for each prediction. The best spans are then smoothed by a running lines smoother and the final prediction chosen by linear interpolation. For small samples (n < 40) or if there are substantial serial correlations between observations close in x-value, then a pre-specified fixed span smoother (span > 0) should be used. Reasonable span values are 0.2 to 0.4.” https://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/modreg/html/supsmu.html

Local non-param lplm (in Rearrangement) Local nonparametric method, local linear regression estimator with box kernel (default), for conditional mean functions

Ridge regression Addresses ill-posed regression problems using filtering approaches (e.g. high-pass) Often called “regularization” lm.ridge (in MASS)

Quantile regression quantreg (in R) is desired if conditional quantile functions are of interest. One advantage of quantile regression, relative to the ordinary least squares regression, is that the quantile regression estimates are more robust against outliers in the response measurements In practice we often prefer using different measures of central tendency and statistical dispersion to obtain a more comprehensive analysis of the relationship between variables quantreg (in R)

Splines smooth.spline, splinefun (stats, modreg) and ns (in splines) http://www.inside-r.org/r-doc/splines a numeric function that is piecewise-defined by polynomial functions, and which possesses a sufficiently high degree of smoothness at the places where the polynomial pieces connect (which are known as knots)

Splines For interpolation, splines are often preferred to polynomial interpolation - they yields similar results to interpolating with higher degree polynomials while avoiding instability due to overfitting Features: simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes Most common: cubic spline, i.e., of order 3—in particular, cubic B-spline

More… Partial Least Squares Regression (PLSR) mvr (in pls) Principal Component Regression (PCR) Canonical Powered Partial Least Squares (CPPLS)

PCR creates components to explain the observed variability in the predictor variables, without considering the response variable at all On the other hand, PLSR does take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components Whether or not that ultimately translates into a better model, in terms of its practical use, depends on the context

Linear Discriminant Analysis Find a linear combination of features that characterizes or separates two or more classes of objects or events, i.e. a linear classifier, c.f. dimension reduction then classification (multiple classes, e.g. facial rec.) Library lda in package MASS Dependent variable (the class) is categorial and independent variables are continuous Assumes normal distribution of classes and equal class co-variances, c.f. Fisher LD does not (fdaCMA in package CMA)

Relation to PCA, FA? Both seek linear combinations of variables which best “explain” the data (variance) LDA explicitly models the difference between the classes of data PCA on the other hand does not take into account any difference in class Factor analysis (FA) builds the feature combinations based on differences of factors rather than similarities

Relation to PCA, FA? Discriminant analysis is not an interdependence technique: a distinction between independent variables and dependent variables is made (cf. different from factor analysis) NB: If you have categorical independent variables, the equivalent technique is Discriminant Correspondence Analysis (discrimin.coa in ade4) See also Flexible DA (fda) and Mixture DA (mda) in mda

Now mixed models

What is a mixed model? Often known as latent class (mixed models) or linear, or non-linear mixed models Basic type – mix of two models Random component to model, or is unobserved Systematic component = observed… E.g. linear model: y=y0+br x + bs z y0 – intercept br – for random coefficient bs for systematic coefficient Or y=y0+fr(x,u,v,w) + fs(z,a,b) Or …

Example Gender – systematic Movie preference – random? In semester – systematic Students on campus – random? Summer – systematic People at the beach – random?

Remember latent variables? In factor analysis – goal was to use observed variables (as components) in “factors” Some variables were not used – why? Low cross-correlations? Small contribution to explaining the variance? Mixed models aim to include them!! Thoughts?

Latent class (LC) LC models do not rely on the traditional modeling assumptions which are often violated in practice (linear relationship, normal distribution, homogeneity) less subject to biases associated with data not conforming to model assumptions. In addition, LC models include variables of mixed scale types (nominal, ordinal, continuous and/or count variables) in the same analysis.

Latent class (LC) For improved cluster or segment description the relationship between the latent classes and external variables (covariates) can be assessed simultaneously with the identification of the clusters. eliminates the need for the usual second stage of analysis where a discriminant analysis is performed to relate the cluster results to demographic and other variables.

Kinds of Latent Class Models Three common statistical application areas of LC analysis are those that involve 1)  clustering of cases, 2)  variable reduction and scale construction, and 3) prediction.

Thus! To construct and then run a mixed model, YOU must make many choices including: the nature of the hierarchy, the fixed effects and, the random effects.

Beyond mixture = 2? Hierarchy, fixed, random = 3? More? Changes over time – a fourth dimension?

Comparing lm, glm, lme4, lcmm lmm.data <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module9/lmm.data.txt", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE) summary(lmm.data) id extro open agree social class school Min. : 1.0 Min. :30.20 Min. :22.30 Min. :18.48 Min. : 46.31 a:300 I :200 1st Qu.: 300.8 1st Qu.:54.17 1st Qu.:36.20 1st Qu.:31.90 1st Qu.: 89.32 b:300 II :200 Median : 600.5 Median :60.15 Median :39.98 Median :35.05 Median : 99.20 c:300 III:200 Mean : 600.5 Mean :60.27 Mean :40.06 Mean :35.07 Mean : 99.53 d:300 IV :200 3rd Qu.: 900.2 3rd Qu.:66.50 3rd Qu.:43.93 3rd Qu.:38.42 3rd Qu.:109.83 V :200 Max. :1200.0 Max. :90.83 Max. :57.87 Max. :58.44 Max. :151.96 VI :200

Comparing lm, glm, lme4, lcmm > head(lmm.data) id extro open agree social class school 1 1 63.69356 43.43306 38.02668 75.05811 d IV 2 2 69.48244 46.86979 31.48957 98.12560 a VI 3 3 79.74006 32.27013 40.20866 116.33897 d VI 4 4 62.96674 44.40790 30.50866 90.46888 c IV 5 5 64.24582 36.86337 37.43949 98.51873 d IV 6 6 50.97107 46.25627 38.83196 75.21992 d I > nrow(lmm.data) [1] 1200

Comparing lm, glm, lme4, lcmm lm.1 <- lm(extro ~ open + social, data = lmm.data) summary(lm.1) Call: lm(formula = extro ~ open + social, data = lmm.data) Residuals: Min 1Q Median 3Q Max -30.2870 -6.0657 -0.1616 6.2159 30.2947 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.754056 2.554694 22.998 <2e-16 *** open 0.025095 0.046451 0.540 0.589 social 0.005104 0.017297 0.295 0.768 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.339 on 1197 degrees of freedom Multiple R-squared: 0.0003154, Adjusted R-squared: -0.001355 F-statistic: 0.1888 on 2 and 1197 DF, p-value: 0.828

And then lm.2 <- lm(extro ~ open + agree + social, data = lmm.data) summary(lm.2) Call: lm(formula = extro ~ open + agree + social, data = lmm.data) Residuals: Min 1Q Median 3Q Max -30.3151 -6.0743 -0.1586 6.2851 30.0167 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 57.839518 3.148056 18.373 <2e-16 *** open 0.024749 0.046471 0.533 0.594 agree 0.026538 0.053347 0.497 0.619 social 0.005082 0.017303 0.294 0.769 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.342 on 1196 degrees of freedom Multiple R-squared: 0.0005222, Adjusted R-squared: -0.001985 F-statistic: 0.2083 on 3 and 1196 DF, p-value: 0.8907

anova(lm.1, lm.2) Analysis of Variance Table Model 1: extro ~ open + social Model 2: extro ~ open + agree + social Res.Df RSS Df Sum of Sq F Pr(>F) 1 1197 104400 2 1196 104378 1 21.598 0.2475 0.619

Nesting, etc lm.3 <- lm(extro ~ open + social + class + school, data = lmm.data) summary(lm.3) Call: lm(formula = extro ~ open + social + class + school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -13.1368 -0.9154 0.0176 0.8631 13.6773 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.069523 0.476596 90.369 <2e-16 *** open 0.010793 0.008346 1.293 0.196 social -0.001773 0.003106 -0.571 0.568 classb 2.038816 0.136575 14.928 <2e-16 *** classc 3.696904 0.136266 27.130 <2e-16 *** classd 5.654166 0.136286 41.488 <2e-16 *** schoolII 7.921787 0.167294 47.353 <2e-16 *** schoolIII 12.119003 0.166925 72.602 <2e-16 *** schoolIV 16.052566 0.167100 96.066 <2e-16 *** schoolV 20.410702 0.166936 122.266 <2e-16 *** schoolVI 28.063091 0.167009 168.033 <2e-16 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.669 on 1189 degrees of freedom Multiple R-squared: 0.9683, Adjusted R-squared: 0.968 F-statistic: 3631 on 10 and 1189 DF, p-value: < 2.2e-16

Nesting, etc Call: lm(formula = extro ~ open + agree + social + class + school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -13.1270 -0.9090 0.0155 0.8734 13.7295 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.254814 0.577059 74.957 <2e-16 *** open 0.010833 0.008349 1.298 0.195 agree -0.005474 0.009605 -0.570 0.569 social -0.001762 0.003107 -0.567 0.571 classb 2.044195 0.136939 14.928 <2e-16 *** classc 3.701818 0.136577 27.104 <2e-16 *** classd 5.660806 0.136822 41.374 <2e-16 *** schoolII 7.924110 0.167391 47.339 <2e-16 *** schoolIII 12.117899 0.166983 72.569 <2e-16 *** schoolIV 16.050765 0.167177 96.011 <2e-16 *** schoolV 20.406924 0.167115 122.113 <2e-16 *** schoolVI 28.065860 0.167127 167.931 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.669 on 1188 degrees of freedom Multiple R-squared: 0.9683, Adjusted R-squared: 0.968 F-statistic: 3299 on 11 and 1188 DF, p-value: < 2.2e-16 lm.4 <- lm(extro ~ open + agree + social + class + school, data = lmm.data) summary(lm.4)

Analyze the variances** anova(lm.3, lm.4) Analysis of Variance Table Model 1: extro ~ open + social + class + school Model 2: extro ~ open + agree + social + class + school Res.Df RSS Df Sum of Sq F Pr(>F) 1 1189 3311.4 2 1188 3310.5 1 0.90492 0.3247 0.5689

Specific interaction term # 'class:school’ - different situation than one # with random effects (e.g., nested variables). lm.5 <- lm(extro ~ open + social + class:school, data = lmm.data) summary(lm.5)

Summary Call: lm(formula = extro ~ open + social + class:school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -9.8354 -0.3287 0.0141 0.3329 10.3912 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.008e+01 3.073e-01 260.581 <2e-16 *** open 6.019e-03 4.965e-03 1.212 0.226 social 5.239e-04 1.853e-03 0.283 0.777 classa:schoolI -4.038e+01 1.970e-01 -204.976 <2e-16 *** classb:schoolI -3.460e+01 1.971e-01 -175.497 <2e-16 *** classc:schoolI -3.186e+01 1.970e-01 -161.755 <2e-16 *** classd:schoolI -2.998e+01 1.972e-01 -152.063 <2e-16 *** classa:schoolII -2.814e+01 1.974e-01 -142.558 <2e-16 *** classb:schoolII -2.675e+01 1.971e-01 -135.706 <2e-16 ***

Summary classc:schoolII -2.563e+01 1.970e-01 -130.139 <2e-16 *** classd:schoolII -2.456e+01 1.969e-01 -124.761 <2e-16 *** classa:schoolIII -2.356e+01 1.970e-01 -119.605 <2e-16 *** classb:schoolIII -2.259e+01 1.970e-01 -114.628 <2e-16 *** classc:schoolIII -2.156e+01 1.970e-01 -109.482 <2e-16 *** classd:schoolIII -2.064e+01 1.971e-01 -104.697 <2e-16 *** classa:schoolIV -1.974e+01 1.972e-01 -100.085 <2e-16 *** classb:schoolIV -1.870e+01 1.970e-01 -94.946 <2e-16 *** classc:schoolIV -1.757e+01 1.970e-01 -89.165 <2e-16 *** classd:schoolIV -1.660e+01 1.969e-01 -84.286 <2e-16 *** classa:schoolV -1.548e+01 1.970e-01 -78.609 <2e-16 *** classb:schoolV -1.430e+01 1.970e-01 -72.586 <2e-16 *** classc:schoolV -1.336e+01 1.974e-01 -67.687 <2e-16 *** classd:schoolV -1.202e+01 1.970e-01 -61.051 <2e-16 *** classa:schoolVI -1.045e+01 1.970e-01 -53.038 <2e-16 *** classb:schoolVI -8.532e+00 1.971e-01 -43.298 <2e-16 ***

Summary classc:schoolVI -5.575e+00 1.969e-01 -28.310 <2e-16 *** classd:schoolVI NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9844 on 1174 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9889 F-statistic: 4264 on 25 and 1174 DF, p-value: < 2.2e-16 The output of both models show 'NA' where an interaction # term is redundant with one listed somewhere above it (there are 4 classes and 6 schools).

Specific interaction term lm.6 <- lm(extro ~ open + agree + social + class:school, data = lmm.data) summary(lm.6) # some output omitted… Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.036e+01 3.680e-01 218.376 <2e-16 *** open 6.097e-03 4.964e-03 1.228 0.220 agree -7.751e-03 5.699e-03 -1.360 0.174 social 5.468e-04 1.852e-03 0.295 0.768 … classd:schoolVI NA NA NA NA Residual standard error: 0.9841 on 1173 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9889 F-statistic: 4103 on 26 and 1173 DF, p-value: < 2.2e-16

Compare interaction terms anova(lm.5, lm.6) Analysis of Variance Table Model 1: extro ~ open + social + class:school Model 2: extro ~ open + agree + social + class:school Res.Df RSS Df Sum of Sq F Pr(>F) 1 1174 1137.7 2 1173 1135.9 1 1.7916 1.8502 0.174

Structure in glm Even the more flexible Generalized Linear Model (glm) function can not handle nested effects, although it can handle some types of random effects (e.g., repeated measures designs/data which is not covered here). The primary benefit of the 'glm' function is the ability to specify non-normal distributions Output from the 'glm' function offers the Akaike Information Criterion (AIC) which can be used to compare models and is much preferred over R-square or even adjusted R-square lower AIC indicates a better fitting model; an AIC of -22.45 indicates a better fitting model than one with an AIC of 14.25

glm? 'glm' function offers the Akaike Information Criterion (AIC) – so… glm.1 <- glm(extro ~ open + social + class + school, data = lmm.data) summary(glm.1) Call: glm(formula = extro ~ open + social + class + school, data = lmm.data) Deviance Residuals: Min 1Q Median 3Q Max -13.1368 -0.9154 0.0176 0.8631 13.6773 Coefficients:

glm? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.069523 0.476596 90.369 <2e-16 *** open 0.010793 0.008346 1.293 0.196 social -0.001773 0.003106 -0.571 0.568 classb 2.038816 0.136575 14.928 <2e-16 *** classc 3.696904 0.136266 27.130 <2e-16 *** classd 5.654166 0.136286 41.488 <2e-16 *** schoolII 7.921787 0.167294 47.353 <2e-16 *** schoolIII 12.119003 0.166925 72.602 <2e-16 *** schoolIV 16.052566 0.167100 96.066 <2e-16 *** schoolV 20.410702 0.166936 122.266 <2e-16 *** schoolVI 28.063091 0.167009 168.033 <2e-16 ***

glm? --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2.785041) Null deviance: 104432.7 on 1199 degrees of freedom Residual deviance: 3311.4 on 1189 degrees of freedom AIC: 4647.5 Number of Fisher Scoring iterations: 2

Glm2, 3 > glm.2 <- glm(extro ~ open + social + class:school, data = lmm.data) > glm.3 <- glm(extro ~ open + agree + social + class:school, data = lmm.data)

Compare… Glm1 - AIC: 4647.5 Glm2 - AIC: 3395.5 Glm3 – AIC: 3395.6 Conclusion?

However… In order to adequately test these nested (random) effects, we must turn to another type of modeling function/package. > library(lme4)

However… The Linear Mixed Effects (lme4) package is designed to fit a linear mixed model or a generalized linear mixed model or a nonlinear mixed model. Example – following lm and glm Fit linear mixed effect models with fixed effects for open & social or open, agree, & social, as well as random/nested effects for class within school; to predict scores on the outcome variable, extroversion (extro)

BIC v. AIC Note in the output we can use the Baysian Information Criterion (BIC) to compare models; which is similar to, but more conservative than (and thus preferred over) the AIC mentioned previously. Like AIC; lower BIC reflects better model fit. 'lmer' function uses REstricted Maximum Likelihood (REML) to estimate the variance components (which is preferred over standard Maximum Likelihood; also available as an option).

Random effects 1 Note below, class is nested within school, class is 'under' school. Random effects are specified inside parentheses and can be repeated measures, interaction terms, or nested (as is the case here). Simple interactions simply use the colon separator: (1|school:class) lmm.1 <- lmer(extro ~ open + social + class + (1|school/class), data = lmm.data) summary(lmm.1)

Summary(lmm.1) Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ open + social + class + (1 | school/class) Data: lmm.data REML criterion at convergence: 3521.5 Scaled residuals: Min 1Q Median 3Q Max -10.0144 -0.3373 0.0164 0.3378 10.5788 Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8822 1.6977 school (Intercept) 95.1725 9.7556 Residual 0.9691 0.9844 Number of obs: 1200, groups: class:school, 24; school, 6

Fixed effects: Estimate Std. Error t value (Intercept) 5. 712e+01 4 Fixed effects: Estimate Std. Error t value (Intercept) 5.712e+01 4.052e+00 14.098 open 6.053e-03 4.965e-03 1.219 social 5.085e-04 1.853e-03 0.274 classb 2.047e+00 9.835e-01 2.082 classc 3.698e+00 9.835e-01 3.760 classd 5.656e+00 9.835e-01 5.751 Correlation of Fixed Effects: (Intr) open social classb classc open -0.049 social -0.046 -0.006 classb -0.121 -0.002 0.005 classc -0.121 -0.001 0.000 0.500 classd -0.121 0.000 0.002 0.500 0.500

Random effects 2 lmm.2 <- lmer(extro ~ open + agree + social + class + (1|school/class), data = lmm.data) summary(lmm.2)

Summary(lmm.2) Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ open + agree + social + class + (1 | school/class) Data: lmm.data REML criterion at convergence: 3528.1 Scaled residuals: Min 1Q Median 3Q Max -10.0024 -0.3360 0.0056 0.3403 10.6559 Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8836 1.6981 school (Intercept) 95.1716 9.7556 Residual 0.9684 0.9841 Number of obs: 1200, groups: class:school, 24; school, 6

Summary(lmm.2) Fixed effects: Estimate Std. Error t value (Intercept) 57.3838787 4.0565827 14.146 open 0.0061302 0.0049634 1.235 agree -0.0077361 0.0056985 -1.358 social 0.0005313 0.0018523 0.287 classb 2.0547978 0.9837264 2.089 classc 3.7049300 0.9837084 3.766 classd 5.6657332 0.9837204 5.759 Correlation of Fixed Effects: (Intr) open agree social classb classc open -0.048 agree -0.047 -0.012 social -0.045 -0.006 -0.009 classb -0.121 -0.002 -0.006 0.005 classc -0.121 -0.001 -0.005 0.001 0.500 classd -0.121 0.000 -0.007 0.002 0.500 0.500

Extract # To extract the estimates of the fixed effects parameters. fixef(lmm.2) (Intercept) open agree social classb classc classd 57.3838786775 0.0061301545 -0.0077360954 0.0005312869 2.0547977907 3.7049300285 5.6657331867

Extract # To extract the estimates of the random effects parameters. ranef(lmm.2) $`class:school` (Intercept) a:I -3.4072737 a:II 0.9313953 a:III 1.3514697 a:IV 1.2673650 a:V 1.2019019 a:VI -1.3448582 b:I 0.3041239 b:II 0.2723129 b:III 0.2902246 b:IV 0.2664160 b:V 0.3434127 b:VI -1.4764901 c:I 1.3893592 c:II -0.2505584 c:III -0.3458313 c:IV -0.2497709 c:V -0.3678469 c:VI -0.1753517 d:I 1.2899307 d:II -1.1384176 d:III -1.3554560 d:IV -1.2252297 d:V -0.9877007 d:VI 3.4168733 $school (Intercept) I -13.989270 II -6.114665 III -1.966833 IV 1.940013 V 6.263157 VI 13.867597

Random effects 2 # To extract the coefficients for each group of the random effect factor (class = 2 groups + school = 2 groups == 4 groups) coef(lmm.2) $`class:school` (Intercept) open agree social classb classc classd a:I 53.97660 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:II 58.31527 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:III 58.73535 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:IV 58.65124 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:V 58.58578 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733

Random effects 2 a:VI 56.03902 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:I 57.68800 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:II 57.65619 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:III 57.67410 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:IV 57.65029 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:V 57.72729 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:VI 55.90739 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:I 58.77324 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:II 57.13332 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:III 57.03805 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733

Random effects 2 c:IV 57.13411 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:V 57.01603 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:VI 57.20853 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:I 58.67381 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:II 56.24546 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:III 56.02842 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:IV 56.15865 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:V 56.39618 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:VI 60.80075 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733

Random effects 2 $school (Intercept) open agree social classb classc classd I 43.39461 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 II 51.26921 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 III 55.41705 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 IV 59.32389 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 V 63.64704 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 VI 71.25148 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 attr(,"class") [1] "coef.mer”

Random effects 2 coef(lmm.2)$'class:school’ # …. (Intercept) open agree social classb classc classd a:I 53.97660 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:II 58.31527 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:III 58.73535 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:IV 58.65124 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:V 58.58578 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:VI 56.03902 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:I 57.68800 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:II 57.65619 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:III 57.67410 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:IV 57.65029 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:V 57.72729 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:VI 55.90739 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:I 58.77324 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:II 57.13332 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:III 57.03805 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:IV 57.13411 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:V 57.01603 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:VI 57.20853 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:I 58.67381 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:II 56.24546 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:III 56.02842 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:IV 56.15865 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:V 56.39618 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:VI 60.80075 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733

prediction # To extract the predicted values (based on the fitted model). yhat <- fitted(lmm.2) summary(yhat) Min. 1st Qu. Median Mean 3rd Qu. Max. 39.91 54.43 60.16 60.27 66.35 80.49

prediction # To extract the residuals (errors); and summarize, as well as plot them. residuals <- resid(lmm.2) summary(residuals) Min. 1st Qu. Median Mean 3rd Qu. Max. -9.843000 -0.330600 0.005528 0.000000 0.334800 10.490000

Plot residuals hist(residuals)

Reading, etc. http://data-informed.com/focus-predictive-analytics/ Lab this week NB. Not covering logistic regression since most students know it – if not: https://www.r-bloggers.com/how-to-perform-a-logistic-regression-in-r/