Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Ctd. Local Linear Models, LDA (cf. PCA,FA), Mixed Models: Optimizing, Iterating Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 4 Module 13, April 9, 2018

Smoothing/ local …

Classes of local regression
Locally (weighted) scatterplot smoothing LOESS LOWESS Fitting is done locally - the fit at point x, the fit is made using points in a neighborhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance)

The size of the neighborhood is controlled by α (set by span). For α < 1, the neighbourhood includes proportion α of the points, and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). For α > 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables.

For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least-squares fit, this need not be a very resistant fit. It can be important to tune the control list to achieve acceptable speed.

Friedman (supsmu in modreg)
is a running lines smoother which chooses between three spans for the lines. The running lines smoothers are symmetric, with k/2 data points each side of the predicted point, and values of k as 0.5 * n, 0.2 * n and 0.05 * n, where n is the number of data points. If span is specified, a single smoother with span span * n is used.

Friedman The best of the three smoothers is chosen by cross-validation for each prediction. The best spans are then smoothed by a running lines smoother and the final prediction chosen by linear interpolation. For small samples (n < 40) or if there are substantial serial correlations between observations close in x-value, then a pre-specified fixed span smoother (span > 0) should be used. Reasonable span values are 0.2 to 0.4.”

Local non-param lplm (in Rearrangement)
Local nonparametric method, local linear regression estimator with box kernel (default), for conditional mean functions

Ridge regression Addresses ill-posed regression problems using filtering approaches (e.g. high-pass) Often called “regularization” lm.ridge (in MASS)

Quantile regression quantreg (in R)
is desired if conditional quantile functions are of interest. One advantage of quantile regression, relative to the ordinary least squares regression, is that the quantile regression estimates are more robust against outliers in the response measurements In practice we often prefer using different measures of central tendency and statistical dispersion to obtain a more comprehensive analysis of the relationship between variables quantreg (in R)

Splines smooth.spline, splinefun (stats, modreg) and ns (in splines)
a numeric function that is piecewise-defined by polynomial functions, and which possesses a sufficiently high degree of smoothness at the places where the polynomial pieces connect (which are known as knots)

Splines For interpolation, splines are often preferred to polynomial interpolation - they yields similar results to interpolating with higher degree polynomials while avoiding instability due to overfitting Features: simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes Most common: cubic spline, i.e., of order 3—in particular, cubic B-spline

More… Partial Least Squares Regression (PLSR)
mvr (in pls) Principal Component Regression (PCR) Canonical Powered Partial Least Squares (CPPLS)

PCR creates components to explain the observed variability in the predictor variables, without considering the response variable at all On the other hand, PLSR does take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components Whether or not that ultimately translates into a better model, in terms of its practical use, depends on the context

Linear Discriminant Analysis
Find a linear combination of features that characterizes or separates two or more classes of objects or events, i.e. a linear classifier, c.f. dimension reduction then classification (multiple classes, e.g. facial rec.) Library lda in package MASS Dependent variable (the class) is categorial and independent variables are continuous Assumes normal distribution of classes and equal class co-variances, c.f. Fisher LD does not (fdaCMA in package CMA)

Relation to PCA, FA? Both seek linear combinations of variables which best “explain” the data (variance) LDA explicitly models the difference between the classes of data PCA on the other hand does not take into account any difference in class Factor analysis (FA) builds the feature combinations based on differences of factors rather than similarities

Relation to PCA, FA? Discriminant analysis is not an interdependence technique: a distinction between independent variables and dependent variables is made (cf. different from factor analysis) NB: If you have categorical independent variables, the equivalent technique is Discriminant Correspondence Analysis (discrimin.coa in ade4) See also Flexible DA (fda) and Mixture DA (mda) in mda

Now mixed models

What is a mixed model? Often known as latent class (mixed models) or linear, or non-linear mixed models Basic type – mix of two models Random component to model, or is unobserved Systematic component = observed… E.g. linear model: y=y0+br x + bs z y0 – intercept br – for random coefficient bs for systematic coefficient Or y=y0+fr(x,u,v,w) + fs(z,a,b) Or …

Example Gender – systematic Movie preference – random?
In semester – systematic Students on campus – random? Summer – systematic People at the beach – random?

Remember latent variables?
In factor analysis – goal was to use observed variables (as components) in “factors” Some variables were not used – why? Low cross-correlations? Small contribution to explaining the variance? Mixed models aim to include them!! Thoughts?

Latent class (LC) LC models do not rely on the traditional modeling assumptions which are often violated in practice (linear relationship, normal distribution, homogeneity) less subject to biases associated with data not conforming to model assumptions. In addition, LC models include variables of mixed scale types (nominal, ordinal, continuous and/or count variables) in the same analysis.

Latent class (LC) For improved cluster or segment description the relationship between the latent classes and external variables (covariates) can be assessed simultaneously with the identification of the clusters. eliminates the need for the usual second stage of analysis where a discriminant analysis is performed to relate the cluster results to demographic and other variables.

Kinds of Latent Class Models
Three common statistical application areas of LC analysis are those that involve 1) clustering of cases, 2) variable reduction and scale construction, and 3) prediction.

Thus! To construct and then run a mixed model, YOU must make many choices including: the nature of the hierarchy, the fixed effects and, the random effects.

Beyond mixture = 2? Hierarchy, fixed, random = 3? More?
Changes over time – a fourth dimension?

Comparing lm, glm, lme4, lcmm
lmm.data <- read.table(" header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE) summary(lmm.data) id extro open agree social class school Min. : 1.0 Min. :30.20 Min. :22.30 Min. :18.48 Min. : a:300 I :200 1st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: b:300 II :200 Median : Median :60.15 Median :39.98 Median :35.05 Median : c:300 III:200 Mean : Mean :60.27 Mean :40.06 Mean :35.07 Mean : d:300 IV :200 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: V :200 Max. : Max. :90.83 Max. :57.87 Max. :58.44 Max. : VI :200

> head(lmm.data) id extro open agree social class school d IV a VI d VI c IV d IV d I > nrow(lmm.data) [1] 1200

lm.1 <- lm(extro ~ open + social, data = lmm.data) summary(lm.1) Call: lm(formula = extro ~ open + social, data = lmm.data) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** open social --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1197 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 1197 DF, p-value: 0.828

And then lm.2 <- lm(extro ~ open + agree + social, data = lmm.data)
summary(lm.2) Call: lm(formula = extro ~ open + agree + social, data = lmm.data) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** open agree social --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1196 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 1196 DF, p-value:

anova(lm.1, lm.2) Analysis of Variance Table
Model 1: extro ~ open + social Model 2: extro ~ open + agree + social Res.Df RSS Df Sum of Sq F Pr(>F)

Nesting, etc lm.3 <- lm(extro ~ open + social + class + school, data = lmm.data) summary(lm.3) Call: lm(formula = extro ~ open + social + class + school, data = lmm.data) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** open social classb <2e-16 *** classc <2e-16 *** classd <2e-16 *** schoolII <2e-16 *** schoolIII <2e-16 *** schoolIV <2e-16 *** schoolV <2e-16 *** schoolVI <2e-16 *** Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1189 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 10 and 1189 DF, p-value: < 2.2e-16

Nesting, etc Call: lm(formula = extro ~ open + agree + social + class + school, data = lmm.data) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** open agree social classb <2e-16 *** classc <2e-16 *** classd <2e-16 *** schoolII <2e-16 *** schoolIII <2e-16 *** schoolIV <2e-16 *** schoolV <2e-16 *** schoolVI <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1188 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 11 and 1188 DF, p-value: < 2.2e-16 lm.4 <- lm(extro ~ open + agree + social + class + school, data = lmm.data) summary(lm.4)

Analyze the variances**
anova(lm.3, lm.4) Analysis of Variance Table Model 1: extro ~ open + social + class + school Model 2: extro ~ open + agree + social + class + school Res.Df RSS Df Sum of Sq F Pr(>F)

Specific interaction term
# 'class:school’ - different situation than one # with random effects (e.g., nested variables). lm.5 <- lm(extro ~ open + social + class:school, data = lmm.data) summary(lm.5)

Summary Call: lm(formula = extro ~ open + social + class:school, data = lmm.data) Residuals: Min 1Q Median 3Q Max Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.008e e <2e-16 *** open 6.019e e social 5.239e e classa:schoolI e e <2e-16 *** classb:schoolI e e <2e-16 *** classc:schoolI e e <2e-16 *** classd:schoolI e e <2e-16 *** classa:schoolII e e <2e-16 *** classb:schoolII e e <2e-16 ***

Summary classc:schoolII e e <2e-16 *** classd:schoolII e e <2e-16 *** classa:schoolIII e e <2e-16 *** classb:schoolIII e e <2e-16 *** classc:schoolIII e e <2e-16 *** classd:schoolIII e e <2e-16 *** classa:schoolIV e e <2e-16 *** classb:schoolIV e e <2e-16 *** classc:schoolIV e e <2e-16 *** classd:schoolIV e e <2e-16 *** classa:schoolV e e <2e-16 *** classb:schoolV e e <2e-16 *** classc:schoolV e e <2e-16 *** classd:schoolV e e <2e-16 *** classa:schoolVI e e <2e-16 *** classb:schoolVI e e <2e-16 ***

Summary classc:schoolVI e e <2e-16 *** classd:schoolVI NA NA NA NA --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1174 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 4264 on 25 and 1174 DF, p-value: < 2.2e-16 The output of both models show 'NA' where an interaction # term is redundant with one listed somewhere above it (there are 4 classes and 6 schools).

Specific interaction term
lm.6 <- lm(extro ~ open + agree + social + class:school, data = lmm.data) summary(lm.6) # some output omitted… Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) e e <2e-16 *** open e e agree e e social e e … classd:schoolVI NA NA NA NA Residual standard error: on 1173 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 26 and 1173 DF, p-value: < 2.2e-16

Compare interaction terms
anova(lm.5, lm.6) Analysis of Variance Table Model 1: extro ~ open + social + class:school Model 2: extro ~ open + agree + social + class:school Res.Df RSS Df Sum of Sq F Pr(>F)

Structure in glm Even the more flexible Generalized Linear Model (glm) function can not handle nested effects, although it can handle some types of random effects (e.g., repeated measures designs/data which is not covered here). The primary benefit of the 'glm' function is the ability to specify non-normal distributions Output from the 'glm' function offers the Akaike Information Criterion (AIC) which can be used to compare models and is much preferred over R-square or even adjusted R-square lower AIC indicates a better fitting model; an AIC of indicates a better fitting model than one with an AIC of 14.25

glm? 'glm' function offers the Akaike Information Criterion (AIC) – so… glm.1 <- glm(extro ~ open + social + class + school, data = lmm.data) summary(glm.1) Call: glm(formula = extro ~ open + social + class + school, data = lmm.data) Deviance Residuals: Min 1Q Median 3Q Max Coefficients:

glm? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** open social classb <2e-16 *** classc <2e-16 *** classd <2e-16 *** schoolII <2e-16 *** schoolIII <2e-16 *** schoolIV <2e-16 *** schoolV <2e-16 *** schoolVI <2e-16 ***

glm? --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be ) Null deviance: on 1199 degrees of freedom Residual deviance: on 1189 degrees of freedom AIC: Number of Fisher Scoring iterations: 2

Glm2, 3 > glm.2 <- glm(extro ~ open + social + class:school, data = lmm.data) > glm.3 <- glm(extro ~ open + agree + social + class:school, data = lmm.data)

Compare… Glm1 - AIC: 4647.5 Glm2 - AIC: 3395.5 Glm3 – AIC: 3395.6
Conclusion?

However… In order to adequately test these nested (random) effects, we must turn to another type of modeling function/package. > library(lme4)

However… The Linear Mixed Effects (lme4) package is designed to fit a linear mixed model or a generalized linear mixed model or a nonlinear mixed model. Example – following lm and glm Fit linear mixed effect models with fixed effects for open & social or open, agree, & social, as well as random/nested effects for class within school; to predict scores on the outcome variable, extroversion (extro)

BIC v. AIC Note in the output we can use the Baysian Information Criterion (BIC) to compare models; which is similar to, but more conservative than (and thus preferred over) the AIC mentioned previously. Like AIC; lower BIC reflects better model fit. 'lmer' function uses REstricted Maximum Likelihood (REML) to estimate the variance components (which is preferred over standard Maximum Likelihood; also available as an option).

Random effects 1 Note below, class is nested within school, class is 'under' school. Random effects are specified inside parentheses and can be repeated measures, interaction terms, or nested (as is the case here). Simple interactions simply use the colon separator: (1|school:class) lmm.1 <- lmer(extro ~ open + social + class + (1|school/class), data = lmm.data) summary(lmm.1)

Summary(lmm.1) Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ open + social + class + (1 | school/class) Data: lmm.data REML criterion at convergence: Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. class:school (Intercept) school (Intercept) Residual Number of obs: 1200, groups: class:school, 24; school, 6

Fixed effects: Estimate Std. Error t value (Intercept) 5. 712e+01 4
Fixed effects: Estimate Std. Error t value (Intercept) 5.712e e open 6.053e e social 5.085e e classb 2.047e e classc 3.698e e classd 5.656e e Correlation of Fixed Effects: (Intr) open social classb classc open social classb classc classd

Random effects 2 lmm.2 <- lmer(extro ~ open + agree + social + class + (1|school/class), data = lmm.data) summary(lmm.2)

Summary(lmm.2) Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ open + agree + social + class + (1 | school/class) Data: lmm.data REML criterion at convergence: Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. class:school (Intercept) school (Intercept) Residual Number of obs: 1200, groups: class:school, 24; school, 6

Summary(lmm.2) Fixed effects: Estimate Std. Error t value (Intercept) open agree social classb classc classd Correlation of Fixed Effects: (Intr) open agree social classb classc open agree social classb classc classd

Extract # To extract the estimates of the fixed effects parameters. fixef(lmm.2) (Intercept) open agree social classb classc classd

Extract # To extract the estimates of the random effects parameters.
ranef(lmm.2) $`class:school` (Intercept) a:I a:II a:III a:IV a:V a:VI b:I b:II b:III b:IV b:V b:VI c:I c:II c:III c:IV c:V c:VI d:I d:II d:III d:IV d:V d:VI $school (Intercept) I II III IV V VI

Random effects 2 # To extract the coefficients for each group of the random effect factor (class = 2 groups + school = 2 groups == 4 groups) coef(lmm.2) $`class:school` (Intercept) open agree social classb classc classd a:I a:II a:III a:IV a:V

Random effects 2 a:VI b:I b:II b:III b:IV b:V b:VI c:I c:II c:III

Random effects 2 c:IV c:V c:VI d:I d:II d:III d:IV d:V d:VI

Random effects 2 $school (Intercept) open agree social classb classc classd I II III IV V VI attr(,"class") [1] "coef.mer”

Random effects 2 coef(lmm.2)$'class:school’ # ….
(Intercept) open agree social classb classc classd a:I a:II a:III a:IV a:V a:VI b:I b:II b:III b:IV b:V b:VI c:I c:II c:III c:IV c:V c:VI d:I d:II d:III d:IV d:V d:VI

prediction # To extract the predicted values (based on the fitted model). yhat <- fitted(lmm.2) summary(yhat) Min. 1st Qu. Median Mean 3rd Qu. Max

prediction # To extract the residuals (errors); and summarize, as well as plot them. residuals <- resid(lmm.2) summary(residuals) Min. 1st Qu. Median Mean 3rd Qu. Max

Plot residuals hist(residuals)

Reading, etc. http://data-informed.com/focus-predictive-analytics/
Lab this week NB. Not covering logistic regression since most students know it – if not:

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Similar presentations

Presentation on theme: "Data Analytics – ITWS-4600/ITWS-6600/MATP-4450"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Similar presentations

Presentation on theme: "Data Analytics – ITWS-4600/ITWS-6600/MATP-4450"— Presentation transcript:

Similar presentations

About project

Feedback