MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression
Outline Linear System Solve Linear System Compute the Inverse Matrix Compute Eigenvalues and Eigenvectors Simple Linear Regression Make scatter plots of the data Fit linear regression model Prediction
Linear System 3 x 1 + x 2 – 6 x 3 = –10 2 x 1 + x 2 – 5 x 3 = –8 6 x 1 – 3 x x 3 = 0 In matrix form:
Function ‘solve’ in R 1. Solve x = solve ( A, b ) 2. Find the inverse matrix A_inverse = solve ( A )
Function ‘eigen’ in R y = eigen ( A, symmetric= TRUE or FALSE, only.values= TRUE or FALSE ) Eigenvalues: y$val Eigenvectors: y$vec
Exercise x x x 3 = − 5 — x 1 + x 3 = − 3 3 x 1 + x x 3 = − 3 1. Solve the linear system 2. Find the inverse of the coefficient matrix 3. Compute the eigenvalues and eigenvectors of the coefficient matrix
Given a data set {y i, x i, i=1,…,n} of n observations, y i is dependent variable, x i is independent variable, the linear regression model is or where Simple Linear Regression
Example As Earth’s population continues to grow, the solid waste generated by the population grows with it. Governments must plan for disposal and recycling of ever growing amounts of solid waste. Planners can use data from the past to predict future waste generation and plan for enough facilities for disposing of and recycling the waste.
Example As Earth’s population continues to grow, the solid waste generated by the population grows with it. Governments must plan for disposal and recycling of ever growing amounts of solid waste. Planners can use data from the past to predict future waste generation and plan for enough facilities for disposing of and recycling the waste. Let 1990 be x=
1. Scatter Plots — Function ‘plot’ x=c(0:4) y=c(19358,19484,20293,21499,23561) plot ( x, y, main = 'Tons of Solid Waste Generated From 1990 to 1994’, xlab = 'year', ylab = 'Tons of Solid Waste Generated (in thousands)’, xlim = c(0,4), ylim = c(19000,25000) )
1. Scatter Plots — Function ‘plot’
2. Fit Linear Regression Model — Function ‘lm’ in R reg= lm ( formula, data ) summary ( reg ) In our example, x=c(0:4) y=c(19358,19484,20293,21499,23561) reg=lm(y~x) summary(reg)
> summary(reg) Call: lm(formula = y ~ x) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-05 *** x * --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 3 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 3 DF, p-value: Hence, the function of best fit is y = x
3. Graph the function of best fit with the scatterplot of the data — —Function ‘abline’ plot ( x, y, main = 'Tons of Solid Waste Generated From 1990 to 1994’, xlab = 'year', ylab = 'Tons of Solid Waste Generated (in thousands)’, xlim = c(1990,1994), ylim = c(19000,25000) ) abline(reg)
4. Prediction — Function ‘predict’ in R predict the average tons of waste in 2000 and 2005: predict ( reg, data.frame( x=c(10,15) ) ) Result:
Exercise Education: Average education of occupational incumbents, years, in Income: Average income of incumbents, dollars, in Women: Percentage of incumbents who are women. Prestige: Pineo-Porter prestige score for occupation, from a social survey conducted in the mid-1960s. Census: Canadian Census occupational code. Type: Type of occupation. A factor with levels (note: out of order): bc, Blue Collar; prof, Professional, Managerial, and Technical; wc, White Collar.
Exercise Import data: library (car) View ( Prestige ) education=Prestige$education prestige=Prestige$prestige Make a scatterplot of the data, letting x represent the education and y represent the prestige. Find the line that best fit the above measurements. Graph the function of best fit with the scatterplot of the data. With the function found in part 2, predict the average prestige when education=16 and 17.