Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

7.1 Seeking Correlation LEARNING GOAL
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 4 The Relation between Two Variables
MAT 105 SPRING 2009 Quadratic Equations
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
CORRELATON & REGRESSION
SIMPLE LINEAR REGRESSION
Chapter 3 Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Ch 2 and 9.1 Relationships Between 2 Variables
Correlation and Regression Analysis
Relationships Among Variables
Correlation & Regression Math 137 Fresno State Burger.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Linear Regression and Correlation
Linear Regression.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
Correlation Scatter Plots Correlation Coefficients Significance Test.
Linear Regression and Correlation
Correlation and Linear Regression
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Prior Knowledge Linear and non linear relationships x and y coordinates Linear graphs are straight line graphs Non-linear graphs do not have a straight.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Quantitative Skills 1: Graphing
STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.
8 th Grade Math Common Core Standards. The Number System 8.NS Know that there are numbers that are not rational, and approximate them by rational numbers.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 2 Describing Data.
Chapter 10 Correlation and Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation & Regression
Examining Relationships in Quantitative Research
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Descriptive Analysis and Presentation of Bivariate Data
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
1 ES9 Chapters 4 ~ Scatterplots & Correlation. 2 ES9 Chapter Goals To be able to present bivariate data in tabular and graphic form To gain an understanding.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 ES9 Chapter 5 ~ Regression. 2 ES9 Chapter Goals To be able to present bivariate data in tabular and graphic form To gain an understanding of the distinction.
1 ES9 A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution.
Stat 281: Ch. 11--Regression A man was in a hot-air balloon. Soon he found himself lost with nothing but green fields for as far as the eye could see.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Introduction to Elementary Statistics
Regression and Correlation
Correlation & Regression
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
Correlation and Regression
Descriptive Analysis and Presentation of Bivariate Data
Correlation and Regression
Descriptive Analysis and Presentation of Bivariate Data
Descriptive Analysis and Presentation of Bivariate Data
Presentation transcript:

Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data 5 4 3 2 1 6 Weight Height Regression Plot Y = 2.31464 + 1.28722X r = 0.559

Chapter Goals To be able to present bivariate data in tabular and graphic form To become familiar with the ideas of descriptive presentation To gain an understanding of the distinction between the basic purposes of correlation analysis and regression analysis

3.1 ~ Bivariate Data Bivariate Data: Consists of the values of two different response variables that are obtained from the same population of interest Three combinations of variable types: 1. Both variables are qualitative (attribute) 2. One variable is qualitative (attribute) and the other is quantitative (numerical) 3. Both variables are quantitative (both numerical)

Two Qualitative Variables When bivariate data results from two qualitative (attribute or categorical) variables, the data is often arranged on a cross-tabulation or contingency table Example: A survey was conducted to investigate the relationship between preferences for television, radio, or newspaper for national news, and gender. The results are given in the table below:

Marginal Totals This table may be extended to display the marginal totals (or marginals). The total of the marginal totals is the grand total: Row Totals 760 560 Col. Totals 395 450 475 1320 TV Radio NP Male 280 175 305 Female 115 275 170 Note: Contingency tables often show percentages (relative frequencies). These percentages are based on the entire sample or on the subsample (row or column) classifications.

Percentages Based on the Grand Total (Entire Sample) The previous contingency table may be converted to percentages of the grand total by dividing each frequency by the grand total and multiplying by 100 For example, 175 becomes 13.3% TV Radio NP Row Totals Male 21.2 13.3 23.1 57.6 Female 8.7 20.8 12.9 42.4 Col. Totals 29.9 34.1 36.0 100.0 175 1320 100 13 3 ´ = æ è ç ö ø ÷ .

Percentages Based on Grand Total Illustration These same statistics (numerical values describing sample results) can be shown in a (side-by-side) bar graph: 5 10 15 20 25 TV Radio NP Male Female Percentages Based on Grand Total Percent Media

Percentages Based on Row (Column) Totals The entries in a contingency table may also be expressed as percentages of the row (column) totals by dividing each row (column) entry by that row’s (column’s) total and multiplying by 100. The entries in the contingency table below are expressed as percentages of the column totals: Note: These statistics may also be displayed in a side-by-side bar graph

One Qualitative & One Quantitative Variable 1. When bivariate data results from one qualitative and one quantitative variable, the quantitative values are viewed as separate samples 2. Each set is identified by levels of the qualitative variable 3. Each sample is described using summary statistics, and the results are displayed for side-by-side comparison 4. Statistics for comparison: measures of central tendency, measures of variation, 5-number summary 5. Graphs for comparison: dotplot, boxplot

Example Example: A random sample of households from three different parts of the country was obtained and their electric bill for June was recorded. The data is given in the table below: The part of the country is a qualitative variable with three levels of response. The electric bill is a quantitative variable. The electric bills may be compared with numerical and graphical techniques.

Comparison Using Dotplots . . : . . . . . . ---+---------+---------+---------+---------+---------+--- Northeast . :..:. .. ---+---------+---------+---------+---------+---------+--- Midwest . . . . . : . . ---+---------+---------+---------+---------+---------+--- West 24.0 32.0 40.0 48.0 56.0 64.0 The electric bills in the Northeast tend to be more spread out than those in the Midwest. The bills in the West tend to be higher than both those in the Northeast and Midwest.

Comparison Using Box-and-Whisker Plots 2 3 4 5 6 7 Electric Bill The Monthly Electric Bill

Two Quantitative Variables 1. Expressed as ordered pairs: (x, y) 2. x: input variable, independent variable y: output variable, dependent variable Scatter Diagram: A plot of all the ordered pairs of bivariate data on a coordinate axis system. The input variable x is plotted on the horizontal axis, and the output variable y is plotted on the vertical axis. Note: Use scales so that the range of the y-values is equal to or slightly less than the range of the x-values. This creates a window that is approximately square.

Example Example: In a study involving children’s fear related to being hospitalized, the age and the score each child made on the Child Medical Fear Scale (CMFS) are given in the table below: Construct a scatter diagram for this data

Child Medical Fear Scale Solution age = input variable, CMFS = output variable Child Medical Fear Scale 1 5 4 3 2 9 8 7 6 CMFS Age

3.2 ~ Linear Correlation Measures the strength of a linear relationship between two variables As x increases, no definite shift in y: no correlation As x increases, a definite shift in y: correlation Positive correlation: x increases, y increases Negative correlation: x increases, y decreases If the ordered pairs follow a straight-line path: linear correlation

Example: No Correlation As x increases, there is no definite shift in y: 3 2 1 5 4 Output Input

Example: Positive Correlation As x increases, y also increases: 5 4 3 2 1 6 Output Input

Example: Negative Correlation As x increases, y decreases: Output Input 5 4 3 2 1 9 8 7 6

Please Note Perfect positive correlation: all the points lie along a line with positive slope Perfect negative correlation: all the points lie along a line with negative slope If the points lie along a horizontal or vertical line: no correlation If the points exhibit some other nonlinear pattern: no linear relationship, no correlation Need some way to measure correlation

3.1 ~ Bivariate Data Coefficient of Linear Correlation: r, measures the strength of the linear relationship between two variables Pearson’s Product Moment Formula: Notes: r = +1: perfect positive correlation r = -1 : perfect negative correlation

Alternate Formula for r SS “sum of squ ares for ( ) x x” = n - å 2 SS “sum of squ ares for ( ) y y” = n - å 2 SS “sum of squ ares for ( ) xy xy” = x y n - å

Example Example: The table below presents the weight (in thousands of pounds) x and the gasoline mileage (miles per gallon) y for ten different automobiles. Find the linear correlation coefficient:

Completing the Calculation for r xy x y = - SS ( ) . )( 0. 42 79 7 449 1116 9 47

Please Note r is usually rounded to the nearest hundredth r close to 0: little or no linear correlation As the magnitude of r increases, towards -1 or +1, there is an increasingly stronger linear correlation between the two variables Method of estimating r based on the scatter diagram. Window should be approximately square. Useful for checking calculations.

3.3 ~ Linear Regression Regression analysis finds the equation of the line that best describes the relationship between two variables One use of this equation: to make predictions

Models or Prediction Equations Some examples of various possible relationships: y ^ b x = + 1 a bx cx 2 ( ) log Linear: Quadratic: Exponential: Logarithmic: Note: What would a scatter diagram look like to suggest each relationship?

Method of Least Squares b x = + 1 y ^ Equation of the best-fitting line: y ^ Predicted value: ( ) )) y b x - = + å 2 1 ^ Least squares criterion: Find the constants b0 and b1 such that the sum is as small as possible

Illustration Observed and predicted values of y: y y b x = + ) ( , x y 1 y ^ ) ( , x y y - ^ y ^ ( , ) x

The Line of Best Fit Equation The equation is determined by: b0: y-intercept b1: slope Values that satisfy the least squares criterion:

Example Example: A recent article measured the job satisfaction of subjects with a 14-question survey. The data below represents the job satisfaction scores, y, and the salaries, x, for a sample of similar individuals: 1) Draw a scatter diagram for this data 2) Find the equation of the line of best fit

Finding b1 & b0 Preliminary calculations needed to find b1 and b0:

Line of Best Fit ( ) å y ^ b xy x 118 75 229 5 5174 = SS ( ) . 0. b y 1 133 5174 234 8 4902 = - × å (0. )( . Equation o f the line of best f it: . 0. x = + 1 49 517 y ^ Solution 1)

Job Satisfaction Survey Scatter Diagram 21 23 25 27 29 31 33 35 37 12 13 14 15 16 17 18 19 20 22 Job Satisfaction Salary Job Satisfaction Survey Solution 2)

Please Note Keep at least three extra decimal places while doing the calculations to ensure an accurate answer When rounding off the calculated values of b0 and b1, always keep at least two significant digits in the final answer The slope b1 represents the predicted change in y per unit increase in x The y-intercept is the value of y where the line of best fit intersects the y-axis The line of best fit will always pass through the point

Making Predictions 1. One of the main purposes for obtaining a regression equation is for making predictions y ^ 2. For a given value of x, we can predict a value of 3. The regression equation should be used to make predictions only about the population from which the sample was drawn 4. The regression equation should be used only to cover the sample domain on the input variable. You can estimate values outside the domain interval, but use caution and use values close to the domain interval. 5. Use current data. A sample taken in 1987 should not be used to make predictions in 1999.