Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 281: Ch. 11--Regression A man was in a hot-air balloon. Soon he found himself lost with nothing but green fields for as far as the eye could see.

Similar presentations


Presentation on theme: "Stat 281: Ch. 11--Regression A man was in a hot-air balloon. Soon he found himself lost with nothing but green fields for as far as the eye could see."— Presentation transcript:

1 Stat 281: Ch. 11--Regression A man was in a hot-air balloon. Soon he found himself lost with nothing but green fields for as far as the eye could see. Eventually, he floated over a man who was walking his dog. He leaned over the basket and yelled out, “Hello! Where am I?” The man on the ground replied, “You’re about 20 feet above the ground in a hot-air balloon.” The balloonist angrily shouted back, “You must be a statistician.” “Why do you say that?” asked the man on the ground. “Well,” came the reply, “You’re absolutely correct but your answer is completely useless.” “Oh, I see,” replied the walker, “And you must be a manager.” “Actually, you’re right,” said the balloonist. “How did you know?” “First, you were lost. Then, after deciding what information you needed to solve the problem, you asked someone else to get it for you. Now that you have the information, you’re still lost, but it’s someone else’s fault.”

2 The Data  Bivariate Data consist of values of two variables recorded from the same subjects.  Any combination of numeric or categorical variables may be used –Categorical/categorical –Categorical/numeric –Numeric/numeric

3 Two Categorical Variables  The data may be displayed in a Cross- Tabulation or Contingency Table.  Totals may also be included.  The data may be given as percentages instead of frequencies.

4 Side by Side Bar Graph

5 Categorical/Numeric  View the numeric data as separate samples from populations defined by levels of the categorical variable.  Use summary statistics for side-by- side comparison  Also side-by-side dotplots, boxplots, or stem and leaf.

6 Example: A random sample of households from three different parts of the country was obtained and their electric bill for June was recorded. The data is given in the table below. The part of the country is a categorical variable with three levels. The electric bill is a numeric variable.

7 .. :...... ---+---------+---------+---------+---------+---------+---Northeast. :..:... ---+---------+---------+---------+---------+---------+---Midwest...... :.. ---+---------+---------+---------+---------+---------+---West 24.0 32.0 40.0 48.0 56.0 64.0 Comparison Using DotPlots

8 Box-and-Whisker Plots

9 Two Quantitative Variables  Express as ordered pairs (x,y)  X is thought of as the input or independent variable  Y is thought of as the output or dependent variable  But this may not always reflect an actual cause-and-effect relationship.  Use a Scatter Plot to graph

10 Example: In a study involving children’s fear related to being hospitalized, the age and the score each child made on the Child Medical Fear Scale (CMFS) are given in the table below. Construct a scatter diagram for this data.

11 Scatter Plot

12 What are we looking for?  Linear relationship  Straight line only; others may exist but we don’t care right now.  Due to random variation the points are not usually right on a line.  We want them “close” to a line.  Recall Algebra def. of slope: rise/run or Δy/Δx.  Extremely important interpretation: Change in y as x goes 1 to the right.

13 Understanding Slope  A slope of 1 means if you move x to the right 1 unit, y goes up 1.  A slope of ½ means if you move x to the right 1 unit, y goes up ½.  A slope of -2 means if you move x to the right 1 unit, y goes down 2.  A slope of 0 means if you move x to the right 1 unit, y stays the same.

14 Understanding Linear Relationships  Data are said to display a linear relationship if they approximate a line with non-zero slope.  A slope of zero implies that y is the same for any x, so x has no effect on y, that is, there is no relationship between x and y.  (Undefined slope also means no relationship—but then there would only be one x value in the data, which really leaves nothing to analyze.)  Important: If there is a linear relationship, the slope does NOT indicate the strength of the relationship. For example, change the units: the slope will change, but whatever relationship exists must still be the same!

15 Linear Correlation  A way to quantify (put a number on) the strength of a linear relationship.  Sometimes called “Pearson’s rho” after its inventor.  Note: Correlation in a population is a parameter signified by ρ (rho), while correlation in a sample is a statistic signified by r.  Both -1≤ρ≤1 and -1≤r≤1.  A negative slope will result in a negative correlation, and positive…positive.  A -1 or 1 can only occur if all the data are actually right on the line (a perfect relationship).  A correlation of 0 implies no relationship.  A correlation close to -1 or 1 implies a strong relationship.

16 Defining Correlation  We begin with another quantity—covariance.  It is helpful to note its similarity to variance.  Think of breaking up the square and giving one quantity to each of the variables:

17 Defining Correlation 2  Recall that SS(x), the sum of squares, is like the “variance without the divisor.” We now have two variables, so we will have a sum of squares for x and a sum of squares for y:  And in a similar way, the “covariance without the divisor” is the sum of squares for xy:

18 Defining Correlation 3

19 Example: no correlation. As x increases, there is no definite shift in y.

20 Example: positive correlation. As x increases, y also increases.

21 Example: negative correlation. As x increases, y decreases.

22 Example: The table below presents the weight (in thousands of pounds) x and the gasoline mileage (miles per gallon) y for ten different automobiles. Find the linear correlation coefficient.

23

24 Calculations

25 Regression Analysis  Find the equation for “best fit line”  Least squares criterion: Find the constants b 0 and b 1 such that the squared deviations between data values and predicted values are as small as possible, i.e. minimize

26 Observed and predicted values of y:

27 The equation of the line of best fit: Determined by b 0 : y-intercept b 1 : slope Values that satisfy the least squares criterion:

28 Example: A recent article measured the job satisfaction of subjects with a 14-question survey. The data below represents the job satisfaction scores, y, and the salaries, x, for a sample of similar individuals. 1.Draw a scatter diagram for this data. 2.Find the equation of the line of best fit.

29 Preliminary calculations needed to find b 1 and b 0 :

30 Finding b 1 and b 0 :

31 Scatter diagram:

32 Some Misc. Notes 1.Keep at least three extra decimal places while doing the calculations to ensure an accurate answer. 2.When rounding off the calculated values of b 0 and b 1, always keep at least two significant digits in the final answer. 3.The slope b 1 represents the predicted change in y per unit increase in x. 4.The y-intercept is the value of y where the line of best fit intersects the y-axis. 5.The line of best fit will always pass through the point.

33 Making Predictions 1.One of the main purposes for obtaining a regression equation is for making predictions. 2.For a given value of x, we can predict a value of y. 3.The regression equation should be used to make predictions only about the population from which the sample was drawn. 4.The regression equation should be used only to cover the sample domain on the input variable. You can estimate values outside the domain interval, but use caution and use values close to the domain interval. 5.Use current data. A sample taken in 1987 should not be used to make predictions in 1999.


Download ppt "Stat 281: Ch. 11--Regression A man was in a hot-air balloon. Soon he found himself lost with nothing but green fields for as far as the eye could see."

Similar presentations


Ads by Google