Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values.

Similar presentations


Presentation on theme: "Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values."— Presentation transcript:

1 Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values (CHIDIST) Simpson’s Paradox: –Lurking variables can reverse comparisons Recall Linear Regression –Fit a line to a scatterplot

2 Recall Linear Regression Idea: Fit a line to data in a scatterplot Recall Class Example 14 https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls To learn about “basic structure” To “model data” To provide “prediction of new values”

3 Inference for Regression Goal: develop Hypothesis Tests and Confidence Int’s For slope & intercept parameters, a & b Also study prediction

4 Inference for Regression Idea: do statistical inference on: –Slope a –Intercept b Model: Assume: are random, independent and

5 Inference for Regression Viewpoint: Data generated as: y = ax + b Y i chosen from X i Note: a and b are “parameters”

6 Inference for Regression Parameters and determine the underlying model (distribution) Estimate with the Least Squares Estimates: and (Using SLOPE and INTERCEPT in Excel, based on data)

7 Inference for Regression Distributions of and ? Under the above assumptions, the sampling distributions are: Centerpoints are right (unbiased) Spreads are more complicated

8 Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data  Accurate est. of slope Small for x’s more spread out –Data more spread  More accurate Small for more data –More data  More accuracy

9 Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data  Accur’te est. of intercept Smaller for –Centered data  More accurate intercept Smaller for more data –More data  More accuracy

10 Inference for Regression One more detail: Need to estimate using data For this use: Similar to earlier sd estimate, Except variation is about fit line is similar to from before

11 Inference for Regression Now for Probability Distributions, Since are estimating by Use TDIST and TINV With degrees of freedom =

12 Inference for Regression Convenient Packaged Analysis in Excel: Tools  Data Analysis  Regression Illustrate application using: Class Example 27, Old Text Problem 8.6 (now 10.12)

13 Inference for Regression Class Example 27, Old Text Problem 8.6 (now 10.12) Utility companies estimate energy used by their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:

14 Inference for Regression Data for October through June are: MonthX = Deg. DaysY = Gas Cons’n Oct15.65.2 Nov26.86.1 Dec37.88.7 Jan36.48.5 Feb35.58.8 Mar18.64.9 Apr15.34.5 May7.92.5 Jun01.1

15 Inference for Regression Class Example 27, Old Text Problem 8.6 (now 10.12) Excel Analysis: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg27.xls Good News: Lots of things done automatically Bad News: Different language, so need careful interpretation

16 Inference for Regression Excel Glossary: ExcelStat 31 R2R2 r 2 = Prop’n of Sum of Squares Explained by Line interceptIntercept b X VariableSlope a CoefficientEstimates &.

17 Inference for Regression Excel Glossary: ExcelStat 31 Standard Errors Estimates of &. (recall from Sampling Dist’ns) T – Stat.(Est. – mean) / SE, i.e. put on scale of T – distribution P-valueFor 2-sided test of:

18 Inference for Regression Excel Glossary: ExcelStat 31 Lower 95% Upper 95% Ends of 95% Confidence Interval for a and b (since chose 0.95 for Confidence level) Predicted.Points on line at, i.e..

19 Inference for Regression Excel Glossary: ExcelStat 31 Residual for. Recall: gave useful information about quality of fit Standard Residuals: on standardized scale

20 Inference for Regression Some useful variations: Class Example 28, Old Text Problems 10.8 - 10.10 (now 10.13 – 10.15) Excel Analysis: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

21 Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:

22 Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: The data are: YearLean 75642 76644 77656 78667 79673 80688 81696 82698 83713 84717 85725 86742 87757

23 Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: (a)Plot the data, does the trend in lean over time appear to be linear? (b)What is the equation of the least squares fit line? (c)Give a 95% confidence interval for the average rate of change of the lean. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

24 Inference for Regression HW: 10.3 b,c 10.5

25 And Now for Something Completely Different Etymology of: “And now for something completely different” Anybody heard of this before?

26 And Now for Something Completely Different What is “etymology”? Google responses to: define: etymology The history of words; the study of the history of words. csmp.ucop.edu/crlp/resources/glossary.html csmp.ucop.edu/crlp/resources/glossary.html The history of a word shown by tracing its development from another language. www.animalinfo.org/glosse.htm www.animalinfo.org/glosse.htm

27 And Now for Something Completely Different What is “etymology”? Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible. www.two-age.org/glossary.htm www.two-age.org/glossary.htm

28 And Now for Something Completely Different Google response to: define: and now for something completely different And Now For Something Completely Different is a film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Somethi ng_Completely_Different en.wikipedia.org/wiki/And_Now_For_Somethi ng_Completely_Different

29 And Now for Something Completely Different Google Search for: “And now for something completely different” Gives more than 100 results…. A perhaps interesting one: http://www.mwscomp.com/mpfc/mpfc.html

30 And Now for Something Completely Different Google Search for: “Stat 31 and now for something completely different” Gives: [PPT] Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for something completely different… Review Ideas on State Lotteries,. from our study of Expected Value... https://www.unc.edu/~marron/ UNCstat31- 2005/Stat31-05-03-31.ppt - Similar pagesSlide 1View as HTMLSimilar pages

31 Prediction in Regression Idea: Given data Can find the Least Squares Fit Line, and do inference for the parameters. Given a new X value, say, what will the new Y value be?

32 Prediction in Regression Dealing with variation in prediction: Under the model: A sensible guess about, based on the given, is: (point on the fit line above )

33 Prediction in Regression What about variation about this guess? Natural Approach: present an interval (as done with Confidence Intervals) Careful: Two Notions of this: 1.Confidence Interval for mean of 2.Prediction Interval for value of

34 Prediction in Regression 1.Confidence Interval for mean of : Use: where: and where

35 Prediction in Regression Interpretation of: Smaller for closer to But never 0 Smaller for more spread out Larger for larger

36 Prediction in Regression 2.Prediction Interval for value of Use: where: And again

37 Prediction in Regression Interpretation of: Similar remarks to above … Additional “1 + ” accounts for added variation in compared to

38 Prediction in Regression Revisit Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…

39 Prediction in Regression Class Example 28, (now 10.13 – 10.15) Old 10.9: (a)Plot the data, Does the trend in lean over time appear to be linear? (b)What is the equation of the least squares fit line? (c)Give a 95% confidence interval for the average rate of change of the lean. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

40 Prediction in Regression HW: 10.20 and add part: (f) Calculate a 95% Confidence Interval for the mean oxygen uptake of individuals having heart rate 96, and heart rate 115.

41

42 Additional Issues in Regression Robustness Outliers via Java Applet HW on outliers


Download ppt "Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values."

Similar presentations


Ads by Google