Statistical Data Analysis 2011/2012 M. de Gunst Lecture 2.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
4.2.2 Inductive Statistics 1 UPA Package 4, Module 2 INDUCTIVE STATISTICS.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Objectives (BPS chapter 24)
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Inference about a Mean Part II
8-5 Testing a Claim About a Standard Deviation or Variance This section introduces methods for testing a claim made about a population standard deviation.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Section 1 Inference for Linear Regression.
Nonparametrics and goodness of fit Petter Mostad
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Chapter 13: Inference in Regression
Chapter 10 Hypothesis Testing
Fundamentals of Hypothesis Testing: One-Sample Tests
STA291 Statistical Methods Lecture 27. Inference for Regression.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
1 1 Slide © 2005 Thomson/South-Western Chapter 9, Part B Hypothesis Tests Population Proportion Population Proportion Hypothesis Testing and Decision Making.
Chap 6-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 6 The Normal Distribution Business Statistics: A First Course 6 th.
Review for Exam 2 (Ch.6,7,8,12) Ch. 6 Sampling Distribution
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Simple linear regression Tron Anders Moger
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
© Copyright McGraw-Hill 2004
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 7.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 3.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 6.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 5.
Lesson Testing the Significance of the Least Squares Regression Model.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
The normal distribution
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Lecture Slides Elementary Statistics Twelfth Edition
Inference for Regression
CHAPTER 12 More About Regression
CHAPTER 29: Multiple Regression*
Inferences on Two Samples Summary
St. Edward’s University
The normal distribution
Slides by JOHN LOUCKS St. Edward’s University.
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
The Normal Distribution
Introductory Statistics
Presentation transcript:

Statistical Data Analysis 2011/2012 M. de Gunst Lecture 2

Statistical Data Analysis 2 Statistical Data Analysis: Introduction Topics Summarizing data Exploring distributions Bootstrap Robust methods Nonparametric tests Analysis of categorical data Multiple linear regression

Statistical Data Analysis 3 Today’s topic: Exploring distributions (Chapter 3: ) Introduction: goal of investigations 3.1. Quantile function and location-scale families 3.2 QQ-plots for one sample 3.3. Symplots 3.4. (Empirical) QQ-plots for two samples 3.5. Goodness of fit tests Shapiro-Wilk test for normal distribution

Statistical Data Analysis 4 Exploring distributions: introduction (1) Data set: its empirical distribution its underlying distribution

Statistical Data Analysis 5 Exploring distributions: introduction (2) Data set: values of a variable, measured on sample from population Data set: empirical distribution = distribution of the data underlying distribution = distribution of the variable in population Goal: find underlying distribution Empirical distribution helps to determine underlying distribution Different sample from same population: different empirical distribution but same underlying distribution

Statistical Data Analysis 6 Exploring distributions: introduction (3) Goal: find underlying distribution What do the graphical and numerical summaries tell us about underlying distribution of data set? And what not?

Statistical Data Analysis 7 Exploring distributions: introduction (4) Goal: find underlying distribution This lecture’s questions: For one sample: n Do data originate from specific distribution? n Is the underlying distribution symmetric? For two samples: n Do the samples have the same underlying distribution? Answers with graphs and tests

Statistical Data Analysis Quantile function and location-scale families F distribution function of X, α-quantile: x α such that More general: quantile function of F: A distribution function and two of its quantiles if F strictly increasing

Statistical Data Analysis 9 Location-scale family (1) If X has distribution function F, then a + bX has distribution function F a,b given by Location-scale family of F: Expectation and variance of distribution F a,b ??

Statistical Data Analysis 10 Location-scale family (2) Quantiles of F and F a,b in location-scale family of F have linear relationship Example Quantiles of N(2,16) against quantiles of N(0,1)

Statistical Data Analysis 11 Different location-scale families Quantiles of F and G from different location-scale families do not have linear relationship Example Quantiles of members of different location scale families against each other

Statistical Data Analysis Plots for one sample: QQ-plots (1) So far: theoretical quantiles For data x 1, …, x n : Empirical α-quantile x α : fraction α of x 1, …, x n is ≤ x α Empirical quantiles roughly correspond to theoretical quantiles of underlying distribution QQ-plot: plot of empirical quantiles against plot of theoretical quantiles of a particular distribution If QQ-plot shows roughly straight line?? Then underlying distribution of data belongs to location-scale family of that distribution

QQ-plot: plot of empirical quantiles against plot of theoretical quantiles of a particular distribution If QQ-plot shows roughly straight line, then underlying distribution of data belongs to location-scale family of that distribution Example 25 data from N(0,9) 100 data from N(0,9) R: qqnorm, qqexp, qqchisq, etc. Statistical Data Analysis 13 Plots for one sample: QQ-plots (2)

Recall: intercept (a) and slope (b) of best fitting line in QQ-plot are location and scale parameters w.r.t. “distribution on x-axis”. How to estimate expectation and standard deviation of underlying distribution from QQ-plot? Statistical Data Analysis 14 Plots for one sample: QQ-plots (3) If QQ-plot shows roughly straight line, then underlying distribution of data belongs to location-scale family of “distribution on x-axis”. Now: which specific distribution of this family do data come from, or: which location and scale does underlying distribution have? - Estimate by eye intercept (a) and slope (b) of best fitting line in QQ-plot; - Express expectation (or standard deviation) of underlying distribution in terms of (known) expectation (or (known) standard deviation) of “distribution on x-axis”, a and b.

Statistical Data Analysis 15 Plots for one sample: QQ-plots (4) QQ-plot: plot of empirical quantiles against plot of theoretical quantiles of a particular distribution If QQ-plot does not show straight line, then underlying distribution of data belongs to location-scale family of other distribution Example 25 data from N(0,9) 100 data from N(0,9) In this of one with case with heavier tails. How to see this??

Statistical Data Analysis Plots for one sample: symplot (1) Symmetry: X symmetric around θ if X – θ and θ – X have same distribution. If X continuous then probability density symmetric around θ. How to check? n histogram, stem-and-leaf plot n skewness parameter n difference in sample mean and sample median n quantile function → symplot

Statistical Data Analysis 17 Plots for one sample: symplot (2) If X symmetric around θ, then Thus: linear relationship between lower and upper theoretical quantiles Analogously for data from symmetric distribution: linear relationship between lower and upper empirical quantiles Symplot: plot of points R: symplot θ Area α F -1 (α)F -1 (1-α)

Statistical Data Analysis 18 Plots for one sample: symplot (3) If data from symmetric distribution: linear relationship between lower and upper empirical quantiles N(0,1) exp(1)chisq(df=3) 3 examples

Statistical Data Analysis 19 Intermezzo: Scheme for exploring distributions

Statistical Data Analysis Plots for two samples: (empirical) QQ-plots (1) Do two samples have the same underlying distribution? How to answer this with empirical quantiles?

Statistical Data Analysis 21 Plots for two samples: (empirical) QQ-plots (2) Plot for comparing distribution of two samples (Empirical) QQ-plot: If m = n, then plot the points If m < n, then plot the points R: qqplot

Statistical Data Analysis 22 Plots for two samples: (empirical) QQ-plots (3) Do data in samples a and b have same underlying distribution? We see roughly straight line, so … but not line y = x, so underlying distributions of a and b come from same location-scale family, but are not the same !

Statistical Data Analysis Tests for goodness of fit (1) Exploring distributions: now more formal methods Testing Ingredients of test? n Hypotheses H 0 and H 1 n Test statistic T n Distribution of T under H 0 and know how it is changed/shifted under H 1 n Rule for when H 0 will be rejected: u Rejection rule either based on critical region or on p-value How to perform test? n Describe the 4 above ingredients n Choose significance level α n Calculate and report value t of T n Report whether t is in critical region, or whether p-value < α n Formulate conclusion of test: “H 0 rejected” or “H 0 not rejected” n If possible translate conclusion to practical context NB. When asked to perform a test, you have to do all 6 steps!

Statistical Data Analysis 24 Tests for goodness of fit (2) Situation independent realizations from unknown distribution F often: or is a location scale family Be cautious with too strong conclusions When n is small, power small, conclusion not very reliable When n is very, very large, H 0 almost always rejected

Statistical Data Analysis Goodness of fit tests: Shapiro-Wilk test (1) Test for null hypothesis that underlying distribution is normal: independent realizations from unknown distribution F Now is location-scale family of normal distributions Test statistic: (values in (0,1]) Distribution under H 0 from tables or computer package H 0 is rejected for “small” values of W R: shapiro.test

Statistical Data Analysis 26 Goodness of fit tests: Shapiro-Wilk test (2) R: > shapiro.test(beewax) Shapiro-Wilk normality test data: beewax W = , p-value = Conclusion? > shapiro.test(rexp(50)) Shapiro-Wilk normality test data: rexp(50) W = , p-value = Conclusion? “small”

Statistical Data Analysis 27 Recap 3. Introduction: goal of investigations 3.1. Quantile function and location-scale families 3.2. QQ-plots for one sample 3.3. Symplots 3.4. (Empirical) QQ-plots for two samples 3.5. Goodness of fit tests Shapiro-Wilk test for normal distribution

Statistical Data Analysis 28 Investigating distributions The end