Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.

Similar presentations


Presentation on theme: "Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary."— Presentation transcript:

1 Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Course Organization & Website https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html

2 Comments From Grader I encountered some problems in the grading. These problems are: 1. the homework pages are not stapled together. 2. the answers are not the same order as the questions. 3. the results, especially in excel tables, are not highlighted. Could you please emphasize the above problems in your class? If the students follow the rules, the grading will be much easier. In the grading of homework #2, I also hope that you can allow me to enforce the rules by giving zero points.

3 Linear Transformations Idea: What happens to data & summaries, when data are: “shifted and scaled” i.e. “panned and zoomed” Math: Shifted by a Scaled by b

4 Linear Transformations Effect on linear summaries: Centerpoints, and “follow data”:. Spreads, and “feel scale, not shift”:.

5 Most Useful Linear Transfo. “Standardization” Goal: put data sets on “common scale” Approach: 1.Subtract Mean, to “center at 0” 2.Divide by S.D., to “give common SD = 1”

6 Standardization Result is called “z-score”: Note that Thus is interpreted as: “number of SDs from the mean”

7 Standardization Example Buffalo Snowfall Data: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg7Done.xls Standardized data have same (EXCEL default) histogram shape as raw data. (Since axes and bin edges just follow the transformation) i.e. “shape” doesn’t depend on “scaling”

8 Standardization Example A look under the hood: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg7Raw.xls Compute AVERAGE and SD 1.Standardize by: a.Create Formula in cell B2 b.Drag downwards c.Keep Mean and SD cells fixed using $s 3.Check stand’d data have mean 0 & SD 1 note that “8.247E-16 = 0”

9 Standardization HW C6:For the 18 female scores in 1.49, use EXCEL to: a.Give the list of standardized scores b.Give the Z-score for: (i)the mean (0) (ii)the median (-0.0967) (iii)the smallest (-1.52) (iv)the largest (2.23) 1.79

10 Modelling Distributions Text: Section 1.3 Idea: Approximate histograms by: an “idealized curve” i.e. a “density curve” that represents the population

11 Idealized Curve Example Recall Hidalgo Stamps Data, Shifting Bin Movie (made # modes change): https://www.unc.edu/~marron/UNCstat31-2005/StampsHistLoc.mpg Add idealized curve: https://www.unc.edu/~marron/UNCstat31-2005/StampsHistLocKDE.mpg Note: “population curve” shows why histogram modes appear and disappear

12 Interpretation of Density Areas under density curve, give “relative frequency” Proportion of data between = = Area under =

13 Interpretation of Density Note: Total Area under density = 1 (since relative freq. of everything is 1) HW: 1.78 (b: 0.8), 1.79 Work with pencil and paper, not EXCEL

14 Most Useful Density “Normal Curve” = “Gaussian Density” Shape: “like a mound” E.g. of “sand dumped from a truck” Older, worse, description: “bell shaped”

15 Normal Density Example Winter Daily Maximum Temperatures in Melbourne, Australia https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg9Done.xls Notes: Top Histogram is “mound shaped” Plus “small scale random variation” So model with “Normal Density”?

16 Normal Density Curves Note: there is a family of normal curves, indexed by: i.“Center”, i.e. Mean = ii.“Spread”, i.e. Stand. Deviation = Terminology: & are called “parameters” Greek “mu” Greek “sigma” ~ s

17 Family of Normal Curves Think about: “Shifts” (pans) indexed by “Scales” (zooms) indexed by Nice interactive graphical example: http://www.stat.sc.edu/~west/applets/normaldemo1.html (note area under curve is always 1)

18 Normal Curve Mathematics The “normal density curve” is: usual “function” of circle constant = 3.14… natural number = 2.7…

19 Normal Curve Mathematics Main Ideas: Basic shape is: “Shifted to mu”: “Scaled by sigma”: Make Total Area = 1: divide by as, but never

20 Normal Model Fitting Idea: Choose to give: “good” fit to data. Approach: IF the distribution is “mound shaped” & outliers are negligible THEN a “good” choice of normal model is:

21 Normal Fitting Example Revisit Melbourne Daily Max Temps https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg9Done.xls Fit curve, using “Visually good” approximation

22 Normal Fitting Example A look under the hood https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg9Done.xls Use chosen (not default) histogram bins for nice comparison bins Use longer range to avoid the “More” bin Can compute with density formula (Two steps, in cols F and G) Or use NORMDIST function (col J, check same as col G)

23 Normal Curve HW C7: A study of distance runners found a mean weight of 63.1 kg, with a standard deviation of 4.8 kg. Assuming that the distribution of weights is normal, use EXCEL to draw the density curve of the weight distribution.

24 2 Views of Normal Fitting 1.“Fit Model to Data” Choose &. 2.“Fit Data to Model” First Standardize Data Then use Normal. Note: same thing, just different rescalings (choose scale depending on need)

25 Normal Distribution Notation The “normal distribution, with mean & standard deviation s ” is abbreviated as:

26 Interpretation of Z-scores Idea: Z-scores are on scale, so use areas to interpret Important Areas: Within 1 sd of mean “the majority”

27 Interpretation of Z-scores 2.Within 2 sd of mean “really most” 3.Within 3 sd of mean “almost all”

28 Interpretation of Z-scores Interactive Version (used for above pics) From Webster West’s Website: http://www.stat.sc.edu/~west/applets/empiricalrule.html

29 Interpretation of Z-scores Summary: These relations are called the “68 - 95 - 99.7 % Rule” HW: 1.82 (a: 234-298, b: 234, 298), 1.83


Download ppt "Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary."

Similar presentations


Ads by Google