MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression
Project Dataset: Any fields you are interested in, large sample size Methods: simple/multiple linear regression simple/multiple logistic regression Due on April 23 rd
Outline Multiple Linear Regression Introduction Make scatter plots of the data Fit multiple linear regression model Prediction Multiple Logistic Regression Introduction Fit multiple logistic regression model Exercise
Given a data set {y i, x i, i=1,…,n} of n observations, y i is dependent variable, x i is independent variable, the linear regression model is or where Recall: Simple Linear Regression
Given a data set of n observations, y i is dependent variable, are independent variables, the linear regression model is Multiple Linear Regression
Generally, we can do transformations for those x i ’s before plugging them in the model and they might not be independent with each other. 1. Transformations: 2. Dependent case: 3. Cross-Product Terms:
Example The data includes the selling price at auction of 32 antique grandfather clocks. The ages of the clocks and the number of people who mad a bid are also recorded in this dataset. AgeBiddersPrice
Recall: Scatter Plots — Function ‘plot’ plot (auction $ Age, auction $ Price, main= 'Relationship between Price and Age')
plot (auction $ Bidders, auction $ Price, main= 'Relationship between Price and Number of bidders')
plot ( auction )
Fit Multiple Linear Regression Model — Function ‘lm’ in R reg= lm ( formula, data ) summary ( reg ) In our example, reg= lm ( Price ~ Age + Bidders, data = auction )
> summary(reg) Call: lm(formula = Price ~ Age + Bidders, data = auction) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-08 *** Age e-14 *** Bidders e-11 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Hence, the function of best fit is Price = * Age * Bidders –
Prediction — Function ‘predict’ in R predict the average price of the clock with Age=150, bidders=10: predict ( reg, data.frame ( Age=150,Bidders=10) ) predict the average price of the clock with Age=150, Bidders=10 and Age=160, Bidders=5: predict ( reg, data.frame ( Age=c(150,160), Bidders=c(10,5)) )
Exercise 1. Download data: ‘Mass and Physical Measurements for Male Subjects’ 2. Import txt file in R 3. Use ‘Mass’ as the response, ‘ Fore’, ‘Waist’, ‘Height’ and ‘Thigh’ as independent variables 4. Make scatter plot for the response and each of the independent variables 5. Fit the multiple linear regression 6. Predict ‘Mass’ with Fore= 30, Waist=180, Height=38 and Thigh=58 and with Fore=29, Waist=179, Height=39 and Thigh=57
Recall: Simple Logistic Regression Odds: Log-odds:
Recall: Simple Logistic Regression Logistic regression models the log-odds as a linear function of independent variables Not a linear function of X
Multiple Logistic Regression
Example am: transmission, 0: auto, 1: manual hp: gross horsepower wt: weight (lb/1000)
Multiple Logistic Regression — Function ‘glm’ in R logreg=glm(fomula, family=‘binomial’,data=binary) glm: generalized linear model Family: distribution of variance Data: name of the dataset In the example, reg = lm ( am ~ hp + wt, data = mtcars )
> summary(reg) Call: lm(formula = am ~ hp + wt, data = mtcars) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-08 *** hp * wt e-06 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Final Model:
For every one unit change in hp, the log odds of manual (versus auto) increases by , odds of manual (versus auto) increases by exp( )= For every one unit change in wt, the log odds of manual (versus auto) decreases by , odds of manual (versus auto) decreases by exp( )=
Exercise 1. Import data from web: 2. Fit the logistic regression of admit (as response) and gre, rank and gpa (as independent variables). What is the final logistic model? Are three independent variables significant ? glm(formula, family=‘binomial’, data=)