Stat 31, Section 1, Last Time Linear transformations Standardization (subt. mean / div. be SD) 5 Number Summary & Outlier Rule Modelling distributions - density Normal distributions Density Interpretation Computation
Computation of Normal Areas Classical Approach: Tables See inside covers of text Summarizes area computations Because can’t use calculus Constructed by “computers” (a job description in the early 1900’s!)
Computation of Normal Areas EXCEL Computation: works in terms of “lower areas” E.g. for Area < 1.3
Computation of Normal Areas Interactive Version (used for above pic) From Webster West’s Website: http://www.stat.sc.edu/~west/applets/normaldemo.html
Computation of Normal Areas EXCEL Computation: (of above e.g.) Enter parameters x is “cutoff point” Return is Area below x
Computation of Normal Areas Computation of areas over intervals: (use subtraction) = -
Computation of Normal Areas Computation of areas over intervals: (use subtraction for EXCEL too) E.g. Use Excel to check 68 - 95 - 99.7% Rule https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg10.xls
Normal Area HW HW: 1.89 1.92 (Hint: the % above 500 = 100% - % below 500) 1.97 1.104 (50%, 9.18%, 0.38%, 40.82%) Caution: Don’t just “twiddle EXCEL until answer appears”. Understand it!!!
Inverse of Area Function Inverse of Frequencies: “Quantiles” Idea: Given area, find “cutoff” x I.e. for Area = 80% This x is the “quantile”
Inverse of Area Function EXCEL Computation of Quantiles: Use NORMINV Continue Class Example: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg10.xls “Probability” is “Area” Enter mean and SD parameters
Inverse Area Example When a machine works normally, it fills bottles with mean = 25 oz, and SD = 0.2 oz. The machine is “out of control” when it overfills. Choose an “alarm level”, which will give only 1 % false alarms. Want: cutoff, x, so that Area above = 1% Note: Area below = 100% - Area above = 99% https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg10.xls
Inverse Area HW 1.105 1.106 (-0.675, 0.675, 89.9, 110.1, 1.35, 0.7%) 1.107
Normal Diagnostic When is the Normal Model “good”? Useful Graphical Device: Q-Q plot = Normal Quantile Plot Won’t devote class time: Useful info in text about this
Variable Relationships Chapter 2 in Text Idea: Look beyond single quantities, to how quantities relate to each other. E.g. How do HW scores “relate” to Exam scores? Section 2.1: Useful graphical device: Scatterplot
Recall Scatterplot E.g. Toy Example: (1,2) (3,1) (-1,0) (2,-1)
Scatterplot E.g. Data from related Intro. Stat. Class (actual scores) How does HW score predict Final Exam? = HW, = Final Exam https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg11.xls In top half of HW scores: Better HW Better Final For lower HW: Final is much more “random”
Scatterplots Common Terminology: When thinking about “X causes Y”, Call X the “Explanatory Var.” or “Indep. Var.” Call Y the “Response Var.” or “Dep. Var.” (think of “Y as function of X”) (although not always sensible)
Scatterplots Note: Sometimes think about causation, Other times: “Explore Relationship” HW: 2.1
Class Scores Scatterplots https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg11.xls How does HW predict Midterm 1? = HW, = MT1 Still better HW better Exam But for each HW, wider range of MT1 scores I.e. HW doesn’t predict MT1 as well as Final “Outliers” in scatterplot may not be outliers in either individual variable e.g. HW = 72, MT1 = 94 (bad HW, but good MT1?, fluke???)
Class Scores Scatterplots https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg11.xls How does MT1 predict MT2? = MT1, = MT2 Idea: less “causation”, more “exploration Still higher MT1 associated with higher MT2 For each MT1, wider range of MT2 i.e. “not good predictor” Interesting Outliers: MT1 = 100, MT2 = 56 (oops!) MT1 = 23, MT2 = 74 (woke up!)
Important Aspects of Relations Form of Relationship Direction of Relationship Strength of Relationship
I. Form of Relationship Linear: Data approximately follow a line Previous Class Scores Example https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg11.xls Final vs. High values of HW is “best” Nonlinear: Data follows different pattern Nice Example: Bralower’s Fossil Data https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg12.xls
Bralower’s Fossil Data https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg12.xls From T. Bralower, formerly of Geological Sci. Studies Global Climate, millions of years ago: Ratios of Isotopes of Strontium Reflects Ice Ages, via Sea Level (50 meter difference!) As function of time Clearly nonlinear relationship
II. Direction of Relationship Positive Association X bigger Y bigger Negative Association X bigger Y smaller E.g. X = alcohol consumption, Y = Driving Ability Clear negative association
III. Strength of Relationship Idea: How close are points to lying on a line? Revisit Class Scores Example: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg11.xls Final Exam is “closely related to HW” Midterm 1 less closely related to HW Midterm 2 even related to Midterm 1
Linear Relationship HW 2.3, 2.5, 2.7, 2.9