We’re ‘NUT’ Giving Up Fundraiser One Grand Prize  Airline tickets Montreal/Ft Lauderdale Return  3-Nights’ Accommodation at Marriott Fort Lauderdale.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Analyzing Bivariate Data With Fathom * CFU Using technology with a set of contextual linear data to examine the line of best fit; determine and.
Chapter 4 The Relation between Two Variables
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 41 Describing Relationships: Scatterplots and Correlation.
Relationships Between Quantitative Variables
Describing the Relation Between Two Variables
The Simple Regression Model
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Describing Relationships: Scatterplots and Correlation
Business Statistics - QBM117 Least squares regression.
Graphing. Representing numerical information in a picture. Graph shows a picture of a relationship -how two processes relate -what happens when two events.
1.5 Scatter Plots and Least-Squares Lines
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
Descriptive Methods in Regression and Correlation
Linear Regression.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
Yesterday’s example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable.
Chapter 3 Describing Bivariate Data General Objectives: Sometimes the data that are collected consist of observations for two variables on the same experimental.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Biostatistics Unit 9 – Regression and Correlation.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Chapter 1 Review MDM 4U Mr. Lieff. 1.1 Displaying Data Visually Types of data Quantitative Discrete – only whole numbers are possible Continuous – decimals/fractions.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5 Summarizing Bivariate Data.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Literacy in Math: Math in the Media: Be Informed! (p. 293) Chapter 7 – One-Variable Data Pearson Math 11 MBF 3C There are 3 kinds of lies: lies, damn lies.
Chapter 1.5 – The Media Mathematics of Data Management (Nelson) MDM 4U
Chapter 10 Correlation and Regression
Objectives (IPS Chapter 2.1)
Summarizing Bivariate Data
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Scatterplots are used to investigate and describe the relationship between two numerical variables When constructing a scatterplot it is conventional to.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
1.5: The Power of Data - The Media
Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Chapter 7 Scatterplots, Association, and Correlation.
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
Creating a Residual Plot and Investigating the Correlation Coefficient.
UNIT QUESTION: Can real world data be modeled by algebraic functions?
1.5 Scatter Plots and Least-Squares Lines Objectives : Create a scatter plot and draw an informal inference about any correlation between the inference.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Linear Regression Day 1 – (pg )
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Yesterday’s example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Chapter 1-2 Review MDM 4U Mr. Lieff. Ch1 Learning Goals Classify data as Quantitative (and continous or discrete) or Qualitatitive Identify the population,
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Simple Linear Regression Relationships Between Quantitative Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Unit 1 Review. 1.1: representing data Types of data: 1. Quantitative – can be represented by a number Discrete Data Data where a fraction/decimal is not.
Chapter 2 Linear regression.
Chapter 8 Linear Regression.
Topics
Sections Review.
CHAPTER 7 LINEAR RELATIONSHIPS
1.3 Trends in Data Due now: p. 20–24 #1, 4, 9, 11, 14
We’re ‘Nut’ Giving Up Fundraiser
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Chapter 2 Looking at Data— Relationships
Review of Chapter 3 Examining Relationships
DRILL Given each table write an equation to find “y” in terms of x.
Algebra Review The equation of a straight line y = mx + b
Review of Chapter 3 Examining Relationships
Presentation transcript:

We’re ‘NUT’ Giving Up Fundraiser One Grand Prize  Airline tickets Montreal/Ft Lauderdale Return  3-Nights’ Accommodation at Marriott Fort Lauderdale  2 Tickets to Florida Panthers Alumni Box  Dinner with Florida Panthers Jesse Winchester  2 Signed Florida Panthers Jerseys  2 Tickets to Miami Dolphins game 500 tickets sold at $100 each Should Mr. Lieff buy one?

We’re ‘NUT’ Giving Up Fundraiser Airline tickets Montreal/Ft Lauderdale Return$ Nights’ Accommodation at Marriot Fort Lauderdale$ Tickets to Florida Panthers Alumni Box$ 400 Dinner with Florida Panthers Jesse Winchester$ Signed Florida Panthers Jerseys$ Tickets to Miami Dolphins game$ 200 TOTAL$2500 E(X) = 2500 * 1/500 = 5 So you are expected to win $5 per $100 ticket. You are better off taking your $100 to a blackjack table where E(X) = 98.5!

1.3 Trends in Data Questions? pp. 20–24 #1, 4, 9, 11, 14 Learning goals: Describe the trend and correlation in a scatter plot Use a line of best fit to make predictions MSIP / Home Learning: p. 37 #2, 3, (6-7 or 8)

Variables Variable (Mathematics)  a symbol denoting an unknown quantity (x, y, θ, etc.) Variable (Statistics)  A measurable attribute; these typically vary over time or between individuals  e.g., height, age, favourite hockey team  Can be discrete, continuous or categorical Continuous: Weight (digital scale) Discrete: Number of siblings Categorical: Hair colour

Scatter Plot a graph that shows two numeric variables each axis represents a variable each point indicates a pair of values (x, y) may show a trend

The Two Types of Variables on a Scatter Plot Independent Variable  Horizontal axis  Time is independent (why?)  Timing is dependent (e.g., time to run 100m) Dependent Variable  Values depend on the independent variable  Vertical axis Format: “dependent vs. independent”  e.g., a graph of arm span vs. height means arm span is the dependent variable and height is the independent

What is a trend? the ‘direction’ of the data a pattern of average behavior that occurs over time e.g., costs tend to increase over time (inflation) need two variables to exhibit a trend (time can be one)

An Example of a trend U.S. population from 1780 to 1960 Describe the trend

Correlations Strength can be…  None – no clear pattern in the data  Weak – data loosely follows a pattern  Strong – data follows a clear pattern If strong or weak, the direction can be…  Positive - data rises from left to right (overall) As x increases, y increases  Negative: data drops from left to right (overall) As x increases, y decreases elationPicture.html elationPicture.html Strong, positive linear correlation

AGENDA for Fri-Mon 1.3 Median-Median Line  Using a regression equation  Fathom Activity - Predict your weight as an NHL player 1.4 Trends With Technology  Correlation Coefficient (R)  Coefficient of Determination (R 2 )  Residuals / Least-Squares Line Fathom Investigation: finding the Least Squares Line

Line of Best Fit A straight line that represents the trend in the data Can be used to make predictions (graph or equation) Can be drawn or calculated  Fathom has 3: movable, median-median, least squares Gives no measurement of the strength of the trend (that’s next class!)

An example line of best fit this is temperature recycling data with a median- median line added what type of trend are we looking at?

Median-Median Line

Creating a Median-Median Line Divide the points into 3 symmetric groups  If there is 1 extra point, include it in the middle group  If there are 2 extra points, include one in each end group Calculate the median x- and y-coordinates for each group and plot the 3 median points (x, y) If the median points are in a straight line, connect them  Otherwise, line up the two outer points, move 1/3 of the way to the other point and draw a line of best fit

Median-Median Line (10 points)

Lines of Best Fit – why 3? Drawing a line of best fit is arbitrary  Hit as many points as possible  Have the same number of points above and below the line  Outliers tend to be ignored The median-median line is easy to construct and takes the spread of the data into consideration The least-squares line takes every point into consideration but is based on a complicated formula Good-Better-Best is a recurring theme in this course  3.3 Measures of Spread (Range, IQR, StdDev)

Using a regression equation The equation of a line of best fit will be in the form y = mx + b e.g., Toronto Maple Leafs roster on 3-Oct-13  W = 7.25H – 332 Mr. Lieff is 73.5” tall. His weight as a Maple Leaf would be:  W = 7.25(73.5) – = or 201 lbs.

Fathom Activity – How much would you weigh as an NHL player? To Generate and Import Data: Click  Pick a group of players that you want to associate with TEAM: Pick your favourite OR select Position, Country, Status, etc.  Select REPORT  BIOS  Click GO>  Copy the URL Open Fathom Click File  Import  Import From URL Paste the URL Double-click the Collection name and shorten it Expand the Collection, right-click the first case and click Cut Case.

To create a graph of Weight vs. Height Create a scatter plot of Weight vs. Height  Double click the Collection icon (cardboard box)  Click the Cases tab  Create a graph in the workspace  Drag Weight and Height to the respective axes Which is dependent? Right-click and select Median-Median Line Use the equation to:  Predict your weight based on your height  Discuss with a neighbour: is the prediction reasonable? Are there any limitations to the model?  Extension: How would you predict your NHL height based on your current weight?

Scatter Plots - Summary A graph that compares two numeric variables  One is dependent on the other May show a correlation  positive/negative  strong/weak A line may be a good model  Median-Median and Least-Squares  If not, non-linear (can be quadratic, exponential, logarithmic, etc.) Excel can do these

1.4 Trends in Data Using Technology Learning goal: Describe and measure the strength of trends Questions?p. 37 #2, 3, (6-7 or 8) MSIP / Home Learning: p. 51 #1-2, 3-5 (Fathom), 8

Regression The process of fitting a line or curve to a set of data A line of best fit is a linear regression (Excel or Fathom) A curve can be quadratic, cubic, exponential, logarithmic, etc. (Excel) We do this to generate a mathematical model (graph or equation) We can use the equation to make predictions  Interpolation – within the span of the data  Extrapolation – outside of the span of the data

Example armspan = 0.87 height + 22 y = 0.87 x + 22 What is the arm span of a student who is 175 cm tall?  y = 0.87(175) + 22  = cm How tall is a student with a 160 cm arm span?  y = 0.87x + 22  160 = 0.87x + 22  160 – 22 = 0.87x  138 = 0.87x  x = 138 ÷ 0.87  = cm

Correlation Coefficient r 2 is the coefficient of determination  Takes on values from 0 to 1  r 2 is the percent of the change in the y-variable that is due to the change in x  if r 2 = 0.52 for the Leafs weight vs. height, 52% of the variation in weight is due to height r is correlation coefficient  indicates of the strength and direction of a linear relationship  r = 0no relationship  r = 1perfect positive correlation  r = -1perfect negative correlation

Residuals a residual is the vertical distance between a point and the line of best fit if the model you are considering is a good fit, the residuals should be small and have no noticeable pattern The least-squares line minimizes the sum of the squares of the residuals

Least Squares Line Weight vs. Height (NHL) w = 7.23h – 325

Using the equation How much does a player who is 71 in tall weigh? w = 7.23(71) – 325  = lbs How tall is a player who weighs 180 lbs?  w = 7.23h – 325  h = (w + 325) ÷ 7.23  So h = ( ) ÷ 7.23  = 69.85” or 177.4cm

NHL Least-Squares Line Activity See handout

1.5 Comparing Apples to Oranges oranges/

The Power of Data Chapter 1.5 – The Media Mathematics of Data Management (Nelson) MDM 4U There are 3 kinds of lies: lies, damn lies and statistics.

Example 1 – Changing the scale on the axis Why is the following graph misleading?

Example 1 – Scale from 0 Consider that this is a bar graph – could it still be misleading?

Include every category!

Example 2 – Using a Small Sample For the following surveys, consider:  The sample size  If there is any (mis)leading language

Example 2 – Using a Small Sample “4 out of 5 dentists recommend Trident sugarless gum to their patients who chew gum.” “In the past, we found errors in 4 out of 5 of the returns people brought in for a Second Look review.” (H&R Block) “Did you know that 1 in 4 women can misread a traditional pregnancy test result?” (Clearblue Easy Digital Pregnancy Test) “Using Pedigree® DentaStix® daily can reduce the build up of tartar by up to 80%.” “Did you know that the average Canadian wastes $500 of food in a year?” (Zip-Lock Freezer bags)

Details on the Trident Survey How many dentists did they ask?  Actual number: out of 5 is convincing but reasonable  5 out of 5 is preposterous  3 out of 5 is good but not great  Actual statistic 85% Recommend Trident over what?  There were 2 other options: Chewing sugared gum Not chewing gum

Misleading Statements(?) How could these statements be misleading? “More people stay with Bell Mobility than any other provider.” “Every minute of every hour of every business day, someone comes back to Bell.”

“More people stay with Bell Mobility than any other provider.” Does not specify how many more customers stay with Bell.  e.g. Percentage of customers renewing their plan: Bell: 30% Rogers: 29% Telus: 25% Fido: 28% Did they compare percentages or totals? What does it mean to “stay with Bell”? Honour entire contract? Renew contract at the end of a term? Are early terminations factored in? If so, does Bell have a higher cost for early terminations? Competitors’ renewal rates may have decreased due to family plans / bundling Does the data include Private / Corporate plans?

“Every minute of every hour of every business day, someone comes back to Bell.” 60 mins x 7 hours x 5 days = 2 100/wk What does it mean to “Come back to Bell”? How many hours in a business day?

How does the media use (misuse) data? To inform the public about world events in an objective manner It sometimes gives misleading or false impressions to sway the public or to increase ratings It is important to:  Study statistics to understand how information is represented or misrepresented  Correctly interpret tables/charts presented by the media

MSIP / Homework Read pp. 57 – 60 Ex. 1-2 Complete p. 60 #1-6 Final Project Example – Manipulating Data (on wiki) Final Project Example – Manipulating Data Examples