Download presentation
Presentation is loading. Please wait.
Published byDerrick Dennis Modified over 9 years ago
1
Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s Paradox –Aggregating can give opposite impression Inference for Regression –Sampling Distributions – TDIST & TINV
2
Reading In Textbook Approximate Reading for Today’s Material: Pages 634-667 & Review Approximate Reading for Next Class: Pages 634-667 & Review
3
Inference for Regression Chapter 10 Recall: Scatterplots Fitting Lines to Data Now study statistical inference associated with fit lines E.g. When is slope statistically significant?
4
Recall Scatterplot For data (x,y) View by plot: (1,2) (3,1) (-1,0) (2,-1)
5
Recall Linear Regression Idea: Fit a line to data in a scatterplot To learn about “basic structure” To “model data” To provide “prediction of new values”
6
Recall Linear Regression Given a line,, “indexed” by Define “residuals” = “data Y” – “Y on line” = Now choose to make these “small”
7
Recall Linear Regression Make Residuals > 0, by squaring Least Squares: adjust to Minimize the “Sum of Squared Errors”
8
Least Squares in Excel Computation: 1.INTERCEPT (computes y-intercept a) 2.SLOPE (computes slope b) Revisit Class Example 14 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls
9
Inference for Regression Idea: do statistical inference on: –Slope a –Intercept b Model: Assume: are random, independent and
10
Inference for Regression Viewpoint: Data generated as: y = ax + b Y i chosen from X i Note: a and b are “parameters”
11
Inference for Regression Parameters and determine the underlying model (distribution) Estimate with the Least Squares Estimates: and (Using SLOPE and INTERCEPT in Excel, based on data)
12
Inference for Regression Distributions of and ? Under the above assumptions, the sampling distributions are: Centerpoints are right (unbiased) Spreads are more complicated
13
Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data Accurate est. of slope Small for x’s more spread out –Data more spread More accurate Small for more data –More data More accuracy
14
Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data Accur’te est. of intercept Smaller for –Centered data More accurate intercept Smaller for more data –More data More accuracy
15
Inference for Regression One more detail: Need to estimate using data For this use: Similar to earlier sd estimate, Except variation is about fit line is similar to from before
16
Inference for Regression Now for Probability Distributions, Since are estimating by Use TDIST and TINV With degrees of freedom =
17
Inference for Regression Convenient Packaged Analysis in Excel: Tools Data Analysis Regression Illustrate application using: Class Example 32, Old Text Problem 10.12
18
Inference for Regression Class Example 32, Old Text Problem 10.12 Utility companies estimate energy used by their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:
19
Inference for Regression Data for October through June are: MonthX = Deg. DaysY = Gas Cons’n Oct15.65.2 Nov26.86.1 Dec37.88.7 Jan36.48.5 Feb35.58.8 Mar18.64.9 Apr15.34.5 May7.92.5 Jun01.1
20
Inference for Regression Class Example 32, Old Text Problem 10.12 Excel Analysis: http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg32.xls Good News: Lots of things done automatically Bad News: Different language, so need careful interpretation
21
Inference for Regression Excel Glossary: ExcelStor 155 R2R2 r 2 = Prop’n of Sum of Squares Explained by Line interceptIntercept b X VariableSlope a CoefficientEstimates &.
22
Inference for Regression Excel Glossary: ExcelStor 155 Standard Errors Estimates of &. (recall from Sampling Dist’ns) T – Stat.(Est. – mean) / SE, i.e. put on scale of T – distribution P-valueFor 2-sided test of:
23
Inference for Regression Excel Glossary: ExcelStor 155 Lower 95% Upper 95% Ends of 95% Confidence Interval for a and b (since chose 0.95 for Confidence level) Predicted.Points on line at, i.e..
24
Inference for Regression Excel Glossary: ExcelStor 155 Residual for. Recall: gave useful information about quality of fit (useful to plot) Standard Residuals: on standardized scale
25
Inference for Regression Some useful variations: Class Example 33, Text Problems 10.23 - 10.25 Excel Analysis: http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls
26
Inference for Regression Class Example 33, (10.23 – 10.25) Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:
27
Inference for Regression Class Example 33, (10.23 – 10.25) The data are: YearLean 75642 76644 77656 78667 79673 80688 81696 82698 83713 84717 85725 86742 87757
28
Inference for Regression Class Example 33, (10.23 – 10.25) : (a)Plot the data, does the trend in lean over time appear to be linear? (b)What is the equation of the least squares fit line? (c)Give a 95% confidence interval for the average rate of change of the lean. http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls
29
Inference for Regression HW: 10.17 b,c 10.26 (using log base 10, for part c: Est’d slope: 0.194 Est'd intercept: -379 95% CI for slope: [0.186, 0.202])
30
And Now for Something Completely Different Graphical Displays: Important Topic in Statistics Has large impact Need to think carefully to do this Watch for attempts to fool you
31
And Now for Something Completely Different Graphical Displays: Interesting Article: “How to Display Data Badly” Howard Wainer The American Statistician, 38, 137-147. Internet Available: http://links.jstor.org
32
And Now for Something Completely Different Main Idea: Point out 12 types of bad displays With reasons behind Here are some favorites…
33
And Now for Something Completely Different Hiding the data in the scale
34
And Now for Something Completely Different The eye perceives areas as “size”:
35
And Now for Something Completely Different Change of Scales in Mid- Axis Really trust the Post???
36
Review Slippery Issues Major Confusion: Population Quantities Vs. Sample Quantities
37
Review Slippery Issues Population Quantities: Parameters Will never know But can think about Sample Quantities: Estimates (of parameters) Numbers we work with Contain info about parameters
38
Review Slippery Issues Population Mathematical Notation: (fixed & unknown) Sample Mathematical Notation : (summaries of data, have numbers)
39
Review Slippery Issues Sampling Distributions: Measurement Error: Counting / Proportions:
40
Review Slippery Issues Confidence Intervals: Based on margin of error: Measurement Error: brackets 95% of time Counting / Proportions: brackets95% of time
41
Review Slippery Issues Hypothesis Testing: Statement of Hypotheses: Actual Test: P-value = P{What saw or m.c. | Bdry}
42
Hypothesis Testing from 3/22 Other views of hypothesis testing: View 2: Z-scores Idea: instead of reporting p-value (to assess statistical significance) Report the Z-score A different way of measuring significance
43
Hypothesis Testing – Z scores E.g. Fast Food Menus: Test Using P-value = P{what saw or m.c.| H 0 & H A bd’ry}
44
Hypothesis Testing – Z scores P-value = P{what saw or or m.c.| H 0 & H A bd’ry}
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.