Excel - Data Analysis Dr. Theodore Cleveland University of Houston CIVE 1331 – Computing for Engineers Lecture 009 NEXT LECTURE: Powerpoint, EXCLE
Linear Regression Analysis Factors that affect data Lack of precision in instruments Human error Unaccounted variables/factors in model
Least Squares Method Least squares is a method permits the selection of a line which minimizes the sum of squared error between all of the data points and the line which best fits our data.
LS Example Consider fitting the following data to a line (linear equation)
LS Example Graphical Interpretation
Linear Regression To minimize the square of the error, we seek to find a minimal solution to the following equations Where a = slope of the line b = y-intercept of the line n = number of data points x = x-coordinates of the individual points y = y-coordinates of the individual points
Linear Regression How well of a fit has been achieved can be measured with r2 (r2 is called the linear correlation coefficient and it is one measure of how well a straight line explains the data) that is given by the following expression: r2 = 1-SSE/SST where: SSE is the sum of the squares of the error and SST is the sum of the squares of the deviations about the mean which are given by:
Regression Tools In Excel Plot data using “XY Scatter” Graph Select the data by clicking on one of the data points. Several of the points should become highlighted when they have been selected. In the menu bar, click Add Trendline under the Chart menu or click the right mouse button to get a quick reference pop-up panel. Under the “options” select to display equation and r2 value. Click OK. Selecting it and hitting the delete key can remove the trend line.
Regression Tools The regression tools can also be employed in Excel to determine the coefficients for the equation of the line without graphing the line first Under Tools on the menu bar, select “data analysis”. Under data analysis select “Regression”. Specify the range of cells with the dependent variable – y Specify the range of cells with the independent variable -x Specify a point in the spreadsheet for the output data - select a point which does not have anything for about 10 columns to the right. Click OK. The slope and intercept are given under the heading “Coefficients” If “data analysis” doesn’t appear under “Tools”, select “Tools - Addins - Analysis Toolpak”.
Optimization Problems In many problems in engineering we try to find the optimum solution. The optimum solution may maximize profits, maximize success, minimize cost, minimize the amount of material, etc.
Optimization Problems In order to formulate an optimization problem we must first develop an objective function. OBJECTIVE FUNCTION: An objective function establishes a relationship which is to be optimized as a function of a series of independent variables (x1, x2, etc.) y = f(x1, x2, x3,….xi)
Optimization Problems The independent variables xi are the selected parameters which will be varied to maximize or minimize the objective function. For example, if profits are to be maximized for a company that makes two products A and B, x1 may be set equal to the number of units of product A, and x2 may be set equal to the number of units of product B. To maximize profits we may write the objective function y as a function of the cost per unit of the two products. If A costs $5 a unit and B costs $10 a unit y = 5 x1 + 10 x2
Constraints We must apply some constraints to our solution. Let g(x) = number of hours worked Then g(X) < [some maximum number of hours worked]: We require that our workforce can work no more than 8000 hours a month. X1>0, X2 > 0: We can only produce a positive number of widgets X1+X2 > 1000: We require that at least 1000 units/month be manufactured.
Use Additional Information to Create Equations Labor costs $12/hour Product A - Sold for $120 per unit, requires 5 hours labor/unit. ProfitA = X1 * (120-12*5) ProfitA = X1 * 60 $60 per product A Product B - Sold for $80 per unit, requires 3 hours labor/unit. ProfitB = X2 * (80-3*12) ProfitB = X2 * 44 $44 per product B
Objective Equations Revisted Constraints have transformed our objective function (the equation we want to maximize) into: Y=60X1 + 44X2 Where: X1>0 X2>0 X1 + X2 <= 8000
More on Functions of two variables Unlike problems you have solved before, this problem is a function of two variables. In Calculus 3, you will learn methods to solve for minimum and maximum values for variables of several variables analytically. We can solve this now empirically however using Excel solver function. You calculator may even have similar functionality built into it. But first we will examine the problem graphically
Graphs of 1 independent varible We recognize that we can plot the value of a function on the Y axis, and the independent variable on the X axis, A minimum value of a the function whose value is plotted on the Y value can be seen easily at X = 2. Recall, Analytically we would solve for the minimum by finding the roots of the function’s derviative.
Graphs of Two Independent variables We can choose to represent an equation F(X1,X2) as a 3D Graph With: X1 on one axis X2 on a second axis F(X1,X2) on a third axis (usually vertical) You can think of the grid at the bottom of the graph as a list of coordinates (X,Y) that we will evaluate F at. The value of F(x,y) is shown as the surface elevation on the third axis. Where does a Maximum Occur?
Data Analysis Our first step will be to name a cell for each variable we are using. (This step is not necessary but it facilitates future steps) We will label on cell X, one cell Y, and one cell F. Labeling a Cell Step 1) Select a cell Step 2) Enter a Label in the Cell Name Box and press [Enter]
Excel Solver Enter the objective function in the cell labeled “F” e.g. =60*X+44*Y From the “Tools” Menu Select “Solver”. *If Solver is not available Click on “Add-ins” under tools and check “Solver”
Excel Solver Select the Target Cell, this is going to be our function or cell “F” (alternatively we could type “B6”) Under the “Equal To:” Radio button select “Max” Under “By Changing Cells” enter “X,Y” (alternatively B4,B5)
Excel Solver Now we need to enter our constraints. Recall that they were: X1 + X2 1000 (produce at least 1000 units) 5X1 + 3X2 8000 (No more than 8000 hours) X1 > 0 and X2 > 0 (can’t produce negative units) In order to enter these equations into Excel, we need to re-arrange so that a singular variable appears on one side of the equality X1<=(8000-3X2)/5 X1>=1000-X2 X1 > 0 and X2 > 0
Enter Constraints: Select Add to add constraints. A Add Constraint dialog will allow you to enter your equations When you are done select “Solve” from the Solver Dialog. Follow Dialogs
Histograms It is often necessary to analyze the data to illustrate how the data values are distributed within their range. This is called a histogram or a relative frequency plot. To create a histogram, you subdivide the range of the data into a series of adjacent, equally spaced intervals. Once the intervals are defined, you determine how many data values fall within each interval . The relative frequency is the percentage of total data items that fall into a given range.
Histograms Enter your data in a single column (or row) in Excel. Enter a list of “Bins” or border values that define your cell ranges in another column.
Histograms From the “Tools” menu select “Data Analysis.” If it does not exist select “Analysis ToolPak” from the “Add-ins” menu. Select “Histogram” from the Data Analysis Dialog
Histograms Under Input Range, select your data values Under Bin Range, select your range of border values Change additional options if needed
Histogram Your Output
Other Statistical Data You can use Excel functions such as MAX(cell_range), MIN(cell_range), AVERAGE(cell_range), and so on to get several statistical values.
Other Statistical Data Alternatively the Data Analysis menu has a “Descriptive Statistics” option that will display many single-variable statistics at once
Other Statistical Data Here is the output from the “Descriptive Statistics tool:
End of Lecture The Remaining slides show how to generate a 3D plot, and are not necessary for the course Next Time Algorithms Newton’s Method Jacobian Programming Touch on Fortran
Addendum: 3D Plot in Excel Creating 3D plots becomes trivial with the use of named cells. To create a cell name, select the cell, enter a value into the cell name field and press [enter]
Create a 3D plot Create 3 named fields (X,Y, F) *note we can not name a field the same as an existing cell name i.e. we can not use X1, be a cell in row 1 column X already exists Now enter the value of the equation in cell “F” (e.g. = “5X+3Y”) Cell A3 renamed “F” The formula in “F”
Create a 3D plot Create the data range(X1’s, X2’s in a table. Highlight the area Note that the upper left corner is blank From the menu select “Data”, then Table
Create a 3D plot Enter “X” for “Row input cell” Enter “Y” for “Column input cell” Position cursor on upper left corner of table and enter “=F”
Create a 3D plot Your table should now be populated with “F” values, Highlight the Table From “Insert” Menu, select “Chart
Create a 3D plot Select “Surface” from the “Standard [Chart] Types” Click “Next” to accept default options for your chart
Create a 3D Plot View your chart, you may wish to change some options.