Chapter 3: Uncertainty "variation arises in data generated by a model" "how to transform knowledge of this variation into statements about the uncertainty.

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Advertisements

Chapter 6 Continuous Random Variables and Probability Distributions
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Sampling Distributions (§ )
Multivariate distributions. The Normal distribution.
STAT 270 What’s going to be on the quiz and/or the final exam?
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Probability Densities
Simulation Modeling and Analysis
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 7 Sampling and Sampling Distributions
Sufficient statistics. The Poisson and the exponential can be summarized by (n, ). So too can the normal with known variance Consider a statistic S(Y)
Chapter 6 Continuous Random Variables and Probability Distributions
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Tch-prob1 Chapter 4. Multiple Random Variables Ex Select a student’s name from an urn. S In some random experiments, a number of different quantities.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Chapter 2 Simple Comparative Experiments
Chapter 5 Continuous Random Variables and Probability Distributions
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Statistical Intervals Based on a Single Sample.
Simple Linear Regression and Correlation
Lecture II-2: Probability Review
Standard error of estimate & Confidence interval.
ETM 607 – Random Number and Random Variates
One Sample  M ean μ, Variance σ 2, Proportion π Two Samples  M eans, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π Multiple.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Random variables Petter Mostad Repetition Sample space, set theory, events, probability Conditional probability, Bayes theorem, independence,
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Chapter 5 Statistical Models in Simulation
Chapter 3 Basic Concepts in Statistics and Probability
Moment Generating Functions
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Marginal and Conditional distributions. Theorem: (Marginal distributions for the Multivariate Normal distribution) have p-variate Normal distribution.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Chapter 7 Random-Number Generation
Use of moment generating functions 1.Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …).
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
ETM 607 – Random-Variate Generation
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
1 Probability and Statistics Confidence Intervals.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
ASV Chapters 1 - Sample Spaces and Probabilities
The Exponential and Gamma Distributions
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Chapter 6 Confidence Intervals.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Distribution functions
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve,
Chapter 6 Confidence Intervals.
Chapter-1 Multivariate Normal Distributions
Sampling Distributions (§ )
Statistical Inference for the Mean: t-test
Introductory Statistics
Moments of Random Variables
Presentation transcript:

Chapter 3: Uncertainty "variation arises in data generated by a model" "how to transform knowledge of this variation into statements about the uncertainty surrounding the model parameters" Confidence intervals via the frequentist/repeated sampling/classical approach

Parameter  T(Y 1,...,Y n ) estimate of  Var(T) =  2 /n nV   2 in probability as n   V 1/2 standard error (estimated s.d.) for T Average. mean  and variance  2 /n S 2 =  (Y i - ) 2 /(n-1) estimates  2 V 1/2 = n -1/2 S s.e. for

pivot - function of data and parameter whose distribution is known distribution of Z(  0 ) does not depend on  0 Exponential. Pr(Y j /  0  u) = 1 - exp(-u), u>0 Z(  0 ) =  Y j /  0 is gamma parameters 1 and n a sum Approximate. Z(  0 ) = (T -  0 )/V 1/2  N(0,1) in distribution Pr(T - V 1/2 z 1-    0  T - V 1/2 z  )   where  (z  ) =  Approximate (1-2  )  100 % CI for  0 interval estimate

Birth data. approx 95% CI for  0 based on normal Z(  0 ) = n 1/2 ( -  0 )/s day 1 data n=16 = 8.77 s 2 = s = 4.30  =.025 z.025 = (6.66,10.87) hrs of labor Pr(T - V 1/2 z 1-    0  T - V 1/2 z  )  

Binomial ditribution. parameters m,  observation R = R/m var( ) =  (1-  )/m s.e. { (1- )/m} 1/2 pivotal quantity ( -  )/{ (1- )/m} 1/2 approx N(0,1) Suppose m = 1000 =.35 approx 95% CI 35  1.96 .015 Margin of error

Delta method. Gauss Method of linearization T n available, but interested in h(T n )  available, but interested in h(  ) (T n -  )/var(T n ) 1/2  Z in distribution n var (T n )   2 in probability T n =  + n -1/2  Z n continues

h( ) smooth h(y)  h(x)+(y-x)h'(x) for y near x h(T n ) = h(  + T n -  )  h(  ) + (T n -  ) h'(  ) = h(  ) + n -1/2  Z n h'(  ) h(T n )  N( h(  ),h'(  ) 2 var(T n ) )

Poisson, Y. mean , variance  Many techniques "expect" constant variance, normality, linear model Seek h( ) such that var(h(Y)) = 1 h'(  ) 2  = 1 h'(  ) = 1/  1/2 h(  ) = 2  1/2 Work with Y 1/2  N(  1/2, 1/4) Approx 95% CI for  1/2 : ( Y 1/2 - z.975 /2, Y 1/2 - z.025 /2 ) Square up y = 16 births day 1, CI for  1/2 : 4  1.96/2 Square up (9.1,24.8)

Tests. Null hypothesis H 0 : supposeaverage labor time  0 = 6 hours Alternative H A :  > 6 hours Oxford = 8.77 hours n = 16 Is this extreme? Is average time longer in Oxford? Pivot t = ( -  0 )/(s/n 1/2 ) 2.58 = t obs Pr 0 (T  t obs )  1 -  (2.58) =.005 P-value, significance level Choice

Normal model. N( ,  2 ) mean , variance  2 Standard normal Z = (Y-  )/  ~ N(0,1) Density  (z) cdf  (z) Y =  +  Z

Chi-squared distribution. Z 1,...,Z ~ IN(0,1) W = Z Z 2 degrees of freedom additive qchisq() qchisq(.975,14) = (1 - 2  ) CI for  2 ( (n-1) S 2 / c n-1 (1-  ), (n-1) S 2 / c n-1 (  ) ) Cross-fertilized maize. n 1 = 15, s 1 2 = 837.3,  =.025 ( 14  / , 14  / ) (449,2082) eighths of inches squared

Left: chi-squared Right: students t

Student's t distribution. Maize data, differences n = 15 = s 2 = % CI  (1424.6/15) 1/2  2.14 (0.03,41.84) Is H 0 :  = 0 plausible? Not in the 95% confidence interval

F distribution. F = (W / )/(W' /  ) W's independent ~ F,  F,  ~  2 / F 1,  ~ T  2 Maize. n 1, n 2 =15 s 1 2 = 837.3, s 2 2 = Variances  2,   2 CI for  ( F -1,  -1 (  ) s 2 2 /s 1 2, F -1,  -1 (1-  ) s 2 2 /s 1 2 )  =.025 (0.108,.958) H 0 :  = 1

Normal random sample. =  +n -1/2  Z S 2 = (n-1) -1  2 W Z ~ N(0,1) W ~  n-1 2 independently T = Z/{W/(n-1)} 1/2 is students t with n-1 df T is a pivotal quantity for  100  % CI  n -1/2 s t n-1 (  )

Bivariate data

Bivariate distribution. cov(Y 1,Y 2 ) = E[(Y 1 -  1 )(Y 2 -  2 )] =  12 = cov(Y 2,Y 1 ) Collect into a square array cov(Y,Y) =  covariance matrix 2 by 2 variances,  11 and  22, on diagonal covariances,  12 and  21, off diagonal correlation  =  12 /  (  11  22 )

correlations -0.7, 0, 0.7

yahoo.com shares

Multivariate normal. p-variate Y = (Y 1,..., Y p ) T p linear combinations of IN(0,1) Linear combinations of normals are normal If it exists, density function f(y; ,  ) E(Y) =  cov(Y,Y) =  These are vectors and matrices Curves of constant density - ellipses

Properties. Marginals also (multivariate) normal Conditionals are (multivariate) normal Bivariate. E(Y 1 ), E(Y 2 ) = 0; var(Y 1 ), var(Y 2 ) = 1; cov(Y 1, Y 2 ) =  Y 1, Y 2 are N(0,1) Conditional distribution: Y 1 given Y 2 is N(  Y 2, 1 -  2 ) If Y 1 and Y 2 are uncorrelated they are independent

If Y is N p ( ,  ) then a + B T Y ~ N q (a + B T , B T  B) A surprise (Y -  ) T  -1 (Y -  ) ~  p 2 Another surprise and S 2 are statistically independent

Proof. S 2 is based on Y i - These are uncorrelated with and all are normal, hence the Y i - are independent of Use. Suppose have samples size n i from IN(  i,  i 2 ) is normal mean:  1 -  2, variance:  1 2 /n 1 +  2 2 /n 12

Pooled estimate of  2 S 2 = {(n 1 -1)S (n 2 -1)S 2 2 }/(n 1 + n 2 -2)  2  2 / independ of  confidence interval ( )  {S 2 (n n 2 -1 )} 1/2  t (  ) = n 1 + n 2 -2 Maize  /2 (1/15+1/15) 1/2  % CI (3.34,38.53) Doesn't include 0

Simulation. Computer generation of artificial data How much variability to expect Adequacy of approximation Sensitivity of conclusions To provide insight How variable are normal probability plots? What does bivariate normal data look like?  Based on pseudo-random, e.g. approx IN(0,1)

Tiger Woods, 20% Lance Armstrong, 30% Serena Willians, 50% Pictures in cereal boxes with these percents How many boxes do you expect to have to buy to get all 3? X = 3, 4, 5, …

Assume pictures distributed randomly R.v. Pr{X=Tig} =.2, Pr{X=Lan}=.3, Pr{X=Ser}=.5 Simulate times summary() Min. 1st Qu. Median Mean 3rd Qu. Max

Linear congruential generator. X j+1 = (aX j +c) mod M U j = X j /M M = 2 48, a = 5 17, c = 1 Study by simulation!

Other distributions. Continuous cdf F, inverse F -1 Y = F -1 (U) ~ F(y) N(0,1). Z =  -1 (U), Y =  +  Z Exponential. - log(1-U)/ qnorm qgamma qchisq qt, qf Discrete - layout segments, lengths p i, along [0,1]

Birth data. Poisson arrivals, = 12.9/day N ~ y e - /y!, y=0,1,2,3,... (2.6) Arrival times uniform during the day V 1,..., V N 1/24, 0<y<24 Women remain for gamma, shape  = 3.15, mean  = 7.93 hours G 1,...,G N  y  -1 exp{- y}/  (  ) y > 0,  =  / (2.7) V 1 + G 1,..., V N + G N Record how many women present at each arrival/departure