Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Packages in R: logistic regression and SVM Jiang Du March 2008.

Similar presentations


Presentation on theme: "Data Mining Packages in R: logistic regression and SVM Jiang Du March 2008."— Presentation transcript:

1 Data Mining Packages in R: logistic regression and SVM Jiang Du March 2008

2 Logistic Regression lrm in package ``Design” –http://biostat.mc.vanderbilt.edu/s/Design/html/lr m.htmlhttp://biostat.mc.vanderbilt.edu/s/Design/html/lr m.html glm in package ``stats” –http://finzi.psych.upenn.edu/R/library/stats/html/ glm.htmlhttp://finzi.psych.upenn.edu/R/library/stats/html/ glm.html …

3 Logistic Regression: lrm Usage lrm(formula, data, subset, na.action=na.delete, method="lrm.fit", model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE, penalty=0, penalty.matrix, tol=1e-7, strata.penalty=0, var.penalty=c('simple','sandwich'), weights, normwt,...) Arguments Formula –a formula object. An offset term can be included. The offset causes fitting of a model such as logit(Y=1) = Xβ + W, where W is the offset variable having no estimated coefficient. The response variable can be any data type; lrm converts it in alphabetic or numeric order to an S factor variable and recodes it 0,1,2,... internally. Data –data frame to use. Default is the current frame. Usage ## S3 method for class 'lrm': predict(object,..., type=c("lp", "fitted", "fitted.ind", "mean", "x", "data.frame", "terms", "adjto","adjto.data.frame", "model.frame"), se.fit=FALSE, codes=FALSE) Arguments Object –a object created by lrm... –arguments passed to predict.Design, such as kint and newdata (which is used if you are predicting out of data). See predict.Design to see how NAs are handled. Type –…

4 Logistic Regression: lrm Fitting training data –model = lrm(Class ~ X + Y + Z, data=train) Prediction on new data –To get logit(Y=1) predict(model, newdata = test, type = “lp”) –To get Pr(Y=1) predict(model, newdata = test, type = “fitted.ind”)

5 ?formula The models fit by, e.g., the lm and glm functions are specified in a compact symbolic form. The ~ operator is basic in the formation of such models. An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by : operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term. In addition to + and :, a number of other operators are useful in model formulae. The * operator denotes factor crossing: a*b interpreted as a+b+a:b. The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions. The %in% operator indicates that the terms on its left are nested within those on the right. For example a + b %in% a expands to the formula a + a:b. The - operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also used to remove the intercept term: y ~ x - 1 is a line through the origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.

6 Logistic Regression: glm Fitting training data –model = glm(Class ~ X + Y + Z, data=train, family=binomial(logit)) Prediction on new data –To get logit(Y=1) predict(model, newdata = test) –To get Pr(Y=1) predict(model, newdata = test, type = “response”)

7 SVM svm in ``e1071” –http://www.potschi.de/svmtut/svmtut.htmlhttp://www.potschi.de/svmtut/svmtut.html ksvm in ``kernlab” –http://rss.acs.unt.edu/Rdoc/library/kernlab/html/ ksvm.htmlhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/ ksvm.html

8 SVM: svm Kernel the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type. –linear: u'*v –polynomial: (gamma*u'*v + coef0)^degree –radial basis: exp(-gamma*|u-v|^2) –sigmoid: Tanh(gamma*u'*v + coef0)

9 SVM: svm Training –model = svm(Class ~ X + Y + Z, data=train, type = "C“, kernel = “linear”) Prediction –predict(model, newdata = test)


Download ppt "Data Mining Packages in R: logistic regression and SVM Jiang Du March 2008."

Similar presentations


Ads by Google