Stat 155, Section 2, Last Time Normal Distribution: –Interpretation: 68%-95%-99.7% rule –Computation of areas (frequencies) –Inverse Normal area computation.

Slides:



Advertisements
Similar presentations
Linear Regression (C7-9 BVD). * Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st.
Advertisements

Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 8 Linear Regression.
Chapter 10 Re-Expressing data: Get it Straight
Chapter 4 The Relation between Two Variables
Chapter 6: Exploring Data: Relationships Lesson Plan
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
CHAPTER 8: LINEAR REGRESSION
Describing the Relation Between Two Variables
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
CHAPTER 3 Describing Relationships
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
MAT 1000 Mathematics in Today's World. Last Time We saw how to use the mean and standard deviation of a normal distribution to determine the percentile.
Objectives (BPS chapter 5)
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Inferences for Regression
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Bivariate Data When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Stat 155, Section 2, Last Time Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Transformation.
Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.
Chapter 10 Correlation and Regression
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Final Examination Thursday, April 30, 4:00 – 7:00 Location: here, Hanes 120.
Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Last Time Interpretation of Confidence Intervals Handling unknown μ and σ T Distribution Compute with TDIST & TINV (Recall different organization) (relative.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
CHAPTER 3 Describing Relationships
Stat 31, Section 1, Last Time Course Organization & Website What is Statistics? Data types.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapter 3: Describing Relationships
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Stat 31, Section 1, Last Time Linear transformations
Department of Mathematics
Last Time Proportions Continuous Random Variables Probabilities
CHAPTER 3 Describing Relationships
Regression and Residual Plots
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 8 Part 2 Linear Regression
Objectives (IPS Chapter 2.3)
CHAPTER 3 Describing Relationships
Presentation transcript:

Stat 155, Section 2, Last Time Normal Distribution: –Interpretation: 68%-95%-99.7% rule –Computation of areas (frequencies) –Inverse Normal area computation Diagnostics (for Normal approximation) –Normal Quantile plot (linear?) Relations between variables –Scatterplots – useful visualization

Reading In Textbook Approximate Reading for Today’s Material: Pages , , Approximate Reading for Next Class: Pages , ,

Special Request Date: Mon, 29 Jan :37: Subject: special request Professor Marron, I was wondering if you could do a problem like C5 in class tomorrow? I went through the notes, and I went through the workbook for Excel but it seems as though I just can't seem to get how to make a normal standard distribution. Thank you.

An Earlier Date: Thu, 25 Jan :34: (EST) > I don't understand how you draw a density curve on excel without having data > points. In the NORMDIST function, you need data to go along with u, s, and > False. How do you draw it without, or where is the data? Right, in the class example that we considered (Stor155Eg8Done.xls), we thought about fitting a normal curve to a data set. We did this by taking the mean, and s.d. of the data set, and then using that to generate the appropriate memmber of the family of normal curves. Problem C5 is in some sense easier, since bascically the work fo calculationg the mean and s.d. is already done (note they are given as 63.1 and 4.8). So you only need to go through the other steps of generating the graphics input. I guess that one question that will come up is "what to use for endpoints of the x grid?" You could experiment a bit, but usually mean +- 3 s.d. gives a nice looking curve. We will see why in today's class meeting.

Another Date: Sun, 28 Jan :32: (EST) > Hey Professor Marron, I'm having some problems with C5. Ok, so I opened > excel spreadsheet, I typed in the mean and median, I did mean+ 3sd mean- 3sd > For the X-value under NOrdmdist I put in the two x-values and then put in the > sd and mean, and put in FALSE for the cumulative since we want a height > distribution, but it gave me back a number. Am I doing the wrong function? Hmm, sounds like you may not be computing enough points to generate the plot. Basically you should generate a whole column of X-values, and then plug all of those into NORMDIST. An example of this available in Class Eg 8, which is linked to page 19 of the notes for 1/23/07. In that spreadsheet, this grid is in cells E78-E178. The corresponding calls of NORMDIST appear inthe range: J78 - J178. Then you plot those against each other.

Decision Problem: When should I do additional things in class? When should I send people to Open Tutorial Sessions? Depends on # benefitted 1 or 2: send to Open Tutorials Majority: should do in Class

Your Opinion? 1.Raise hand if you think this is worth class time right now. 2.Raise hand if you find this prospect boring, and want to move on instead. If we do this, go to Class Example 8:

Variable Relationships Chapter 2 in Text Idea: Look beyond single quantities, to how quantities relate to each other. E.g. How do HW scores “relate” to Exam scores? Section 2.1: Useful graphical device: Scatterplot

Plotting Bivariate Data Toy Example: (1,2) (3,1) (-1,0) (2,-1)

Plotting Bivariate Data Common Name: “Scatterplot” A look under the hood: EXCEL: Chart Wizard (colored bar icon) Chart Type: XY (scatter) Subtype controls points only, or lines Later steps similar to above (can massage the pic!)

Important Aspects of Relations I.Form of Relationship II.Direction of Relationship III.Strength of Relationship

I.Form of Relationship Linear: Data approximately follow a line Previous Class Scores Example Final vs. High values of HW is “best” Nonlinear: Data follows different pattern Nice Example: Bralower’s Fossil Data

Bralower’s Fossil Data From T. Bralower, formerly of Geological Sci.T. BralowerGeological Sci. Studies Global Climate, millions of years ago: Ratios of Isotopes of Strontium Reflects Ice Ages, via Sea Level (50 meter difference!) As function of time Clearly nonlinear relationship

II. Direction of Relationship Positive Association (slopes upwards) X bigger  Y bigger Negative Association (slopes down) X bigger  Y smaller E.g. X = alcohol consumption, Y = Driving Ability Clear negative association

III. Strength of Relationship Idea: How close are points to lying on a line? Revisit Class Scores Example: Final Exam is “closely related to HW” Midterm 1 less closely related to HW Midterm 2 even related to Midterm 1

Linear Relationship HW 2.3, 2.5, 2.7, 2.11

Comparing Scatterplots Additional Useful Visual Tool: Overlaying multiple data sets Allows comparison Use different colors or symbols Easy in EXCEL (colors are automatic) Already done in HW scores example:

Comparing Scatterplots HW 2.17

And Now for Something Completely Different Remember it takes a college degree to fly a plane, but only a high school diploma to fix one. After every flight, Qantas pilots fill out a form, called a gripe sheet which tells mechanics about problems with the aircraft. The mechanics correct the problems, document their repairs on the form, and then pilots review the gripe sheets before the next flight.

And Now for Something Completely Different Never let it be said that ground crews lack a sense of humor. Here are some actual maintenance complaints submitted by Qantas' pilots (marked with a P) and the solutions recorded (marked with an S) by maintenance engineers.

And Now for Something Completely Different Never let it be said that ground crews lack a sense of humor. Here are some actual maintenance complaints submitted by Qantas' pilots (marked with a P) and the solutions recorded (marked with an S) by maintenance engineers. By the way, Qantas is the only major airline that has never, ever, had an accident.

And Now for Something Completely Different P: Left inside main tire almost needs replacement. S: Almost replaced left inside main tire.

And Now for Something Completely Different P: Test flight OK, except auto-land very rough. S: Auto-land not installed on this aircraft.

And Now for Something Completely Different P: Dead bugs on windshield. S: Live bugs on back-order.

And Now for Something Completely Different P: Evidence of leak on right main landing gear. S: Evidence removed.

And Now for Something Completely Different P: IFF inoperative in OFF mode. S: IFF always inoperative in OFF mode.

And Now for Something Completely Different P: Number 3 engine missing. S: Engine found on right wing after brief search.

And Now for Something Completely Different P: Noise coming from under instrument panel. Sounds like a midget pounding on something with a hammer. S: Took hammer away from midget.

Section 2.2: Correlation Main Idea: Quantify Strength of Relationship Context: –A numerical summary –In spirit of mean and standard deviation –But now applies to pairs of variables

Section 2.2: Correlation Main Idea: Quantify Strength of Relationship Specific Goals: –Near 1: for positive relat’ship & nearly linear –> 0: for positive relationship (slopes up) –= 0: for no relationship –< 0: for negative relationship (slopes down) –Near -1: for negative relat’ship & nearly linear

Correlation - Approach Numerical Approach: for symmetric around has similar properties Worked out Example :

Correlation – Graphical View Plots (a) & (b), illustrating : > 0 for positive relationship < 0 for negative relationship Bigger for data closer to line Problem 1: Not between -1 & 1 Problem 2: Feels “Scale”, see plot (c) Problem 3: Feels “Shift” even more, see (d) (even gets sign wrong!)

Correlation - Approach Solution to above problems: Standardize! Define Correlation

Correlation - Example Revisit above example r is always same, and ~1, for (a), (c), (d) r < 0, and not so close to -1, for (b)

Correlation - Example A look under the hood Cols A&B: generated random numbers (will study later) Product versions used SUMPRODUCT r computed with CORREL (important) r’s same for (a) & (c), since Y’s are “just shifted” r’s also same for (d), since x’s and Y’s shifted (standardization cancels shifts & scales)

Correlation - Example Revisit Class Scores Example: r is always > 0 r is biggest for Final vs. HW r is smallest for MT2 vs. MT1

Correlation - Example Fun Example from Publisher’s Website: Choose Statistical Applets Correlation and Regression Gives feeling for how correlation is affected by changing data.

Correlation - Example Fun Example from Publisher’s Website: Interesting Exercise: Choose points to give correlation r = 0.95 (within 0.01) Destroy with a few outliers

Correlation - HW HW: a

Correlation - Outliers Caution: Outliers can strongly affect correlation, r HW: 2.27b 2.30 (big outlier reduces correlation) Also: recompute correlation with outlier removed

And now for something completely different Recall Distribution of majors of students in this course:

And Now for Something Completely Different Tried to Google “Public Policy Jokes” But couldn’t find anything decent. Next tried “Public Health Jokes” And came up with…

And Now for Something Completely Different Regular Consumption of Guinness Well now, you see it's like this....

And Now for Something Completely Different A herd of buffalo can only move as fast as the slowest buffalo. And when the herd is hunted, it is the slowest and weakest ones at the rear that are killed. This natural selection is good for the herd as a whole because only the fittest survive thus improving the general health and speed of the entire herd.

And Now for Something Completely Different In much the same way the human brain only operates as quickly as the slowest of it's brain cells. Excessive intake of alcohol kills brain cells, as we all know, and naturally the alcohol attacks the slowest/weakest cells first....

And Now for Something Completely Different So it is as plain as the nose on your face that regular consumption of Guinness will eliminate the weaker, slower brain cells thus leaving the remaining cells the best in the brain.

And Now for Something Completely Different The end result, of course, is a faster more efficient brain. If you doubt this at all, tell me, isn't it true that we always feel a bit smarter after a few pints?

Section 2.3: Linear Regression Idea: Fit a line to data in a scatterplot To learn about “basic structure” To “model data” To provide “prediction of new values”

Linear Regression Recall some basic geometry: A line is described by an equation: y = mx + b m = slope m b = y intercept b Varying m & b gives a “family of lines”, Indexed by “parameters” m & b

Basics of Lines Textbook’s notation: Y = bx + a b = m (above) = slope a = b (above) = y-intercept

Basics of Lines HW (to review line ideas): C6: Fred keeps his savings in his mattress. He begins with $500 from his mother, and adds $100 each year. His total savings y, after x years are given by the equation: y = x (a) Draw a graph of this equation.

Basics of Lines C6: (cont.) (b) After 20 years, how much will Fred have? ($2500) (c) If Fred adds $200 instead of $100 each year to his initial $500, what is the equation that describes his savings after x years? (y = x)

Linear Regression Approach: Given a scatterplot of data: Find a & b (i.e. choose a line) to “best fit the data”

Linear Regression - Approach Given a line,, “indexed” by Define “residuals” = “data Y” – “Y on line” = Now choose to make these “small”

Linear Regression - Approach Excellent Demo, by Charles Stanton, CSUSB Try choosing points near a line Then throw in outlier Clear and put points on curve Use “Residual Plot” to diagnose that line is not a good fit to data.

Linear Regression - Approach JAVA Demo, by David Lane at Rice U. Try drawing lines (to min MSE) Experiment with slopes And intercepts Guess r?

Linear Regression - Approach Make Residuals > 0, by squaring Least Squares: adjust to Minimize the “Sum of Squared Errors”