Simple Linear Regression Relationships Between Quantitative Variables
Scatterplot X-axis represents the number of games Y-axis represents the times at-bat. Each point represents a particular baseball player’s number of games played and number of times at- bat.
Scatterplots Explanatory variable (independent variable) is X Response variable (dependent variable) is Y Both must be quantitative! Look for a pattern in the scatterplot Ex: positive linear trend (association) Ex: negative linear trend (association) Ex. Curved trend (nonlinear)
Positive Linear Association Are Number of Home Runs and Number of Runs Batted In related? As the number of home runs increases, how does that effect the outcome of number of runs batted in?
Negative Linear Association It is believed that carbonation (depth) of concrete leads to corrosion of steel frame and thus reduces strength of concrete. What type of relationship appears to be modeled in this scatterplot?
Nonlinear Pattern in scatterplot does not resemble a straight line.
Nonlinear Pattern in scatterplot resembles a typical exponential decay. As the frying time increases, the moisture content decreases exponentially.
Y^ = y-intercept + (slope) x Prisoners^ =.00552*Persons – 4800 b 0 = b 1 = -4800
y^ = y-intercept + (slope) x Slope is the rate at which the y- variable is changing with respect to the x-variable increasing by 1 unit. Y-intercept is the value of y when the x-variable is equal to 0. It is also where the line crosses on the y- axis.
Using Least Squares Regression Line (Equation) Prisoners^ = – *Persons If there were 10,000,000 persons living in a state, predict the number of prisoners for that state?
Predicting rangein nautical miles fuelgphfuel use in gallons per hour range = – (fuelgph) What is the predicted range an airplane can travel consuming 3410 gallons per hour?
Residual Error Consuming 3410 gallons per hour, the actual range traveled was 4988 miles. Our prediction calculated it to be miles. We underestimated! Residual = =
Residuals Residual is the distance between the observed (actual) value and the prediction value generated from least squares regression equation.
Residual Plot There should be no apparent pattern in the residual plot. Any pattern indicates that a linear model is not the best equation to use.
Correlation Coefficient, r r measures the strength of linear strength between the 2 variables. -1 r 1 Sign of r indicates direction of relationship; as magnitude indicates strength of relationship. Perfectly linear if r = -1 or 1. No units.
Correlation Coefficient Game Go to Select Games Select Correlation Correctly match the correlation coefficient r to the corresponding scatterplot.
Coefficient of Determination R 2 is the squared value of r, correlation coefficient. It represents the “proportion of variation in the linear model explained by the explanatory x-variable”.
Coefficient of Determination R 2 =.72. Therefore, 72% of the variation in the response variable (range) is explained by the explanatory variable (costph).
Investigate Go to Select Data Applet Select Baseball (data list on right) Select Accept Data Set Select Scatter Plot (at top of frame) Check Regression and Residual (at left of frame) Select At-Bats as X-variable and Hits as Y-variable Click Up-Date at lower right corner Now change variables and Up-Date! Observe the relationship between variables graphically and statistically.
Draw Your Regression Line Go to ye/index.html ye/index.html Select Begin Click in scatterplot at one end and drag to draw line Select your guess at r Select Show Minimum MSE Then select Draw Regression Line How close were you?
Influential Observation Go to ession.html ession.html Click pointer anywhere in scatterplot to create a new point, notice how it changes the least squares regression equation and the value of r. It is a point (usually with an extreme x value) that influences the least squares regression equation and correlation coefficient.
Causation vs. Correlation Confounding factors (variables) Causation is very difficult to prove and requires an extensive experiment. The response variable could be causing the explanatory variable to change. …