Stat 155, Section 2, Last Time Reviewed Excel Computation of: –Time Plots (i.e. Time Series) –Histograms Modelling Distributions: Densities (Areas) Normal Density Curve (very useful model) Fitting Normal Densities (using mean and s.d.)
Reading In Textbook Approximate Reading for Today’s Material: Pages 71-83, Approximate Reading for Next Class: Pages ,
2 Views of Normal Fitting 1.“Fit Model to Data” Choose &. 2.“Fit Data to Model” First Standardize Data Then use Normal. Note: same thing, just different rescalings (choose scale depending on need)
Normal Distribution Notation The “normal distribution, with mean & standard deviation ” is abbreviated as:
Interpretation of Z-scores Recall Z-score Idea: Transform data By subtracting mean & dividing by s.d. To get (mean 0, s.d. 1) Interpret as I.e. “ is sd’s above the mean”
Interpretation of Z-scores Same idea for Normal Curves: Z-scores are on scale, so use areas to interpret them Important Areas: Within 1 sd of mean “the majority”
Interpretation of Z-scores 2.Within 2 sd of mean “really most” 3.Within 3 sd of mean “almost all”
Interpretation of Z-scores Interactive Version (used for above pics) From Publisher’s Website: Statistical Applets Normal Curve
Interpretation of Z-scores Summary: These relations are called the “ % Rule” HW: 1.86 (a: , b: 234, 298), 1.87
Computation of Normal Areas Classical Approach: Tables See inside covers of text Summarizes area computations Because can’t use calculus Constructed by “computers” (a job description in the early 1900’s!)
Computation of Normal Areas EXCEL Computation: works in terms of “lower areas” E.g. for Area < 1.3 is
Computation of Normal Areas Interactive Version (used for above pic) From Same Publisher’s Website: Statistical Applets Normal Curve
Computation of Normal Areas EXCEL Computation: (of above e.g.) Use NORMDIST Enter parameters x is “cutoff point” Return is Area below x
Computation of Normal Areas Computation of areas over intervals: (use subtraction) = -
Computation of Normal Areas Computation of areas over intervals: (use subtraction for EXCEL too) E.g. Use Excel to check % Rule
Normal Area HW HW (use Excel): (Hint: the % above 130 = 100% - % below 130) 1.99 (see discussion above) Caution: Don’t just “twiddle EXCEL until answer appears”. Understand it!!!
And Now for Something Completely Different A mind blowing video clip: 8 year old Skateboarding Twins: Do they ever miss? You can explore farther… Thanks to Devin Coley for the link
Inverse of Area Function Inverse of Frequencies: “Quantiles” Idea: Given area, find “cutoff” x I.e. for Area = 80% This x is the “quantile”
Inverse of Area Function EXCEL Computation of Quantiles: Use NORMINV Continue Class Example: “Probability” is “Area” Enter mean and SD parameters
Inverse Area Example When a machine works normally, it fills bottles with mean = 25 oz, and SD = 0.2 oz. The machine is “out of control” when it overfills. Choose an “alarm level”, which will give only 1 % false alarms. Want: cutoff, x, so that Area above = 1% Note: Area below = 100% - Area above = 99%
Inverse Area HW 1.95, 1.101, 1.107, a (-0.674, 0.674) (4.3%)
Normal Diagnostic When is the Normal Model “good”? Useful Graphical Device: Q-Q plot = Normal Quantile Plot Idea: look at plot which is approximately linear for data from Normal Model
Normal Quantile Plot Approach, for data : 1.Sort data 2.Compute “Theoretical Proportions”: 3.Compute “Theoretical Z-scores” 4.Plot Sorted Data (Y-axis) vs. Theoretical Z – scores (X-axis)
Normal Quantile Plot Several Examples: Show how to compute in Excel Steps as above
Normal Quantile Plot Main Lessons: Melbourne Winter Temperature Data –Gaussian is good, so looks ~ linear –So OK, to use normal model for these data –Adding trendline helps in assessing linearity
Normal Quantile Plot Main Lessons: Intro Stat Course Exam Scores Data –Skewed distributions nonlinearity –Outliers show up clearly –Normal model unreliable here Combined plot highlights –Mean = Y-intercept –Standard Deviation = Slope
Normal Quantile Plot Main Lessons: Simulated Bimodal Data –Curve is flat near modes –Roughly linear near peaks –Corresponds to two normal subpopulaitons –Goes up fast a valley
Normal Quantile Plot Homework:
And now for something completely different Recall Distribution of majors of students in this course:
And now for something completely different How about a biology joke? A seventh grade Biology teacher arranged a demonstration for his class. He took two earth worms and in front of the class he did the following: He dropped the first worm into a beaker of water where it dropped to the bottom and wriggled about. He dropped the second worm into a beaker of Ethyl alchohol and it immediately shriveled up and died. He asked the class if anyone knew what this demonstration was intended to show them.
And now for something completely different He asked the class if anyone knew what this demonstration was intended to show them. A boy in the second row immediately shot his arm up and, when called on said: "You're showing us that if you drink alcohol, you won't have worms."
Variable Relationships Chapter 2 in Text Idea: Look beyond single quantities, to how quantities relate to each other. E.g. How do HW scores “relate” to Exam scores? Section 2.1: Useful graphical device: Scatterplot
Plotting Bivariate Data Toy Example: (1,2) (3,1) (-1,0) (2,-1)
Plotting Bivariate Data Sometimes: Can see more insightful patterns by connecting points
Plotting Bivariate Data Sometimes: Useful to switch off points, and only look at lines/curves
Plotting Bivariate Data Common Name: “Scatterplot” A look under the hood: EXCEL: Chart Wizard (colored bar icon) Chart Type: XY (scatter) Subtype conrols points only, or lines Later steps similar to above (can massage the pic!)
Scatterplot E.g. Data from related Intro. Stat. Class (actual scores) A.How does HW score predict Final Exam? = HW, = Final Exam i.In top half of HW scores: Better HW Better Final ii.For lower HW: Final is much more “random”
Scatterplots Common Terminology: When thinking about “X causes Y”, Call X the “Explanatory Var.” or “Indep. Var.” Call Y the “Response Var.” or “Dep. Var.” (think of “Y as function of X”) (although not always sensible)
Scatterplots Note: Sometimes think about causation, Other times: “Explore Relationship” HW: 2.1
Class Scores Scatterplots B.How does HW predict Midterm 1? = HW, = MT1 i.Still better HW better Exam ii.But for each HW, wider range of MT1 scores iii.I.e. HW doesn’t predict MT1 as well as Final iv.“Outliers” in scatterplot may not be outliers in either individual variable e.g. HW = 72, MT1 = 94 (bad HW, but good MT1?, fluke???)
Class Scores Scatterplots C.How does MT1 predict MT2? = MT1, = MT2 i.Idea: less “causation”, more “exploration” ii.Still higher MT1 associated with higher MT2 iii.For each MT1, wider range of MT2 i.e. “not good predictor” iv.Interesting Outliers: MT1 = 100, MT2 = 56 (oops!) MT1 = 23, MT2 = 74 (woke up!)
Important Aspects of Relations I.Form of Relationship II.Direction of Relationship III.Strength of Relationship
I.Form of Relationship Linear: Data approximately follow a line Previous Class Scores Example Final vs. High values of HW is “best” Nonlinear: Data follows different pattern Nice Example: Bralower’s Fossil Data
Bralower’s Fossil Data From T. Bralower, formerly of Geological Sci.T. BralowerGeological Sci. Studies Global Climate, millions of years ago: Ratios of Isotopes of Strontium Reflects Ice Ages, via Sea Level (50 meter difference!) As function of time Clearly nonlinear relationship
II. Direction of Relationship Positive Association X bigger Y bigger Negative Association X bigger Y smaller E.g. X = alcohol consumption, Y = Driving Ability Clear negative association
III. Strength of Relationship Idea: How close are points to lying on a line? Revisit Class Scores Example: Final Exam is “closely related to HW” Midterm 1 less closely related to HW Midterm 2 even related to Midterm 1