Data Mining Packages in R: logistic regression and SVM Jiang Du March 2008.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Topic 12: Multiple Linear Regression
Neural networks Introduction Fitting neural networks
Neural Networks and SVM Stat 600. Neural Networks History: started in the 50s and peaked in the 90s Idea: learning the way the brain does. Numerous applications.
Lecture Data Mining in R 732A44 Programming in R.
Mathematics made simple © KS Polynomials A polynomial in x is an expression with positive integer powers of x. Degree of Polynomial Terminology 5x is a.
Computer vision: models, learning and inference Chapter 8 Regression.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Data mining and statistical learning - lecture 6
Support Vector Machine
LibSVM LING572 Fei Xia Week 9: 3/4/08 1. Documentation The libSVM directory on Patas: /NLP_TOOLS/svm/libsvm/latest/
Kernel methods - overview
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
March 7, 2006Lecture 8bSlide #1 Class Analysis Review Predict expected temperature change (c4_34_tc), using the following independent variables: –Age (c5_3_age)
Data mining and statistical learning - lecture 13 Separating hyperplane.
Algebraic Expressions and Formulas
Objectives of Multiple Regression
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Page: 1 of 38 Support Vector Machine 李旭斌 (LI mining Lab. 6/19/2012.
Outline Separating Hyperplanes – Separable Case
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Part II Support Vector Machine Algorithms. Outline  Some variants of SVM  Relevant algorithms  Usage of the algorithms.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
MEASURES of CORRELATION. CORRELATION basically the test of measurement. Means that two variables tend to vary together The presence of one indicates the.
Statistical Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
CHAPTER 17 Model Building
Multiple Regression I KNNL – Chapter 6. Models with Multiple Predictors Most Practical Problems have more than one potential predictor variable Goal is.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Thinking Mathematically
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Exponents and Order of Operations. Exponents The exponent (little number) indicates how many times the base (big number) appears as a factor.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt.
Basic Statistics Linear Regression. X Y Simple Linear Regression.
12/22/ lecture 171 STATS 330: Lecture /22/ lecture 172 Factors  In the models discussed so far, all explanatory variables have been.
Linear Models Alan Lee Sample presentation for STATS 760.
A first order model with one binary and one quantitative predictor variable.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Analysis of Experiments
Regression through the origin
Synthetic Division 1 March Synthetic Division A trick for dividing polynomials Helps us solve for the roots of polynomials Only works when we divide.
Continuous Outcome, Dependent Variable (Y-Axis) Child’s Height
Response Surface Model NASCAR TRACK Top Qualifying Speeds (Through 2009)
Supervised4 Kernel methods Support Vector Machine Data Science in Practice Week 14, 05/23 Jia-Ming Chang
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
1. Refresher on the general linear model, interactions, and contrasts UCL Linguistics workshop on mixed-effects modelling in R May 2016.
Chapter 2 Linear regression.
Simple Linear Regression
Zhenshan, Wen SVM Implementation Zhenshan, Wen
Generalized Linear Models
Multiple Regression.
Ying shen Sse, tongji university Nov. 2016
Regression Analysis PhD Course.
Section 6.2 Linear Equations in One Variable
(& Generalized Linear Models)
Today (2/23/16) Learning objectives:
What is Regression Analysis?
Do whatever is needed to finish…
Soc 3306a Lecture 11: Multivariate 4
y = mx + b Linear Regression line of best fit REMEMBER:
Title of notes: GCF and factoring Algebraic Expressions
Regression and Categorical Predictors
SVMs for Document Ranking
Support Vector Machines 2
Unit 2: Adding Similar (like) Terms
Presentation transcript:

Data Mining Packages in R: logistic regression and SVM Jiang Du March 2008

Logistic Regression lrm in package ``Design” – m.htmlhttp://biostat.mc.vanderbilt.edu/s/Design/html/lr m.html glm in package ``stats” – glm.htmlhttp://finzi.psych.upenn.edu/R/library/stats/html/ glm.html …

Logistic Regression: lrm Usage lrm(formula, data, subset, na.action=na.delete, method="lrm.fit", model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE, penalty=0, penalty.matrix, tol=1e-7, strata.penalty=0, var.penalty=c('simple','sandwich'), weights, normwt,...) Arguments Formula –a formula object. An offset term can be included. The offset causes fitting of a model such as logit(Y=1) = Xβ + W, where W is the offset variable having no estimated coefficient. The response variable can be any data type; lrm converts it in alphabetic or numeric order to an S factor variable and recodes it 0,1,2,... internally. Data –data frame to use. Default is the current frame. Usage ## S3 method for class 'lrm': predict(object,..., type=c("lp", "fitted", "fitted.ind", "mean", "x", "data.frame", "terms", "adjto","adjto.data.frame", "model.frame"), se.fit=FALSE, codes=FALSE) Arguments Object –a object created by lrm... –arguments passed to predict.Design, such as kint and newdata (which is used if you are predicting out of data). See predict.Design to see how NAs are handled. Type –…

Logistic Regression: lrm Fitting training data –model = lrm(Class ~ X + Y + Z, data=train) Prediction on new data –To get logit(Y=1) predict(model, newdata = test, type = “lp”) –To get Pr(Y=1) predict(model, newdata = test, type = “fitted.ind”)

?formula The models fit by, e.g., the lm and glm functions are specified in a compact symbolic form. The ~ operator is basic in the formation of such models. An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by : operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term. In addition to + and :, a number of other operators are useful in model formulae. The * operator denotes factor crossing: a*b interpreted as a+b+a:b. The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions. The %in% operator indicates that the terms on its left are nested within those on the right. For example a + b %in% a expands to the formula a + a:b. The - operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also used to remove the intercept term: y ~ x - 1 is a line through the origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.

Logistic Regression: glm Fitting training data –model = glm(Class ~ X + Y + Z, data=train, family=binomial(logit)) Prediction on new data –To get logit(Y=1) predict(model, newdata = test) –To get Pr(Y=1) predict(model, newdata = test, type = “response”)

SVM svm in ``e1071” – ksvm in ``kernlab” – ksvm.htmlhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/ ksvm.html

SVM: svm Kernel the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type. –linear: u'*v –polynomial: (gamma*u'*v + coef0)^degree –radial basis: exp(-gamma*|u-v|^2) –sigmoid: Tanh(gamma*u'*v + coef0)

SVM: svm Training –model = svm(Class ~ X + Y + Z, data=train, type = "C“, kernel = “linear”) Prediction –predict(model, newdata = test)