Probability Distributions A Brief Introduction
Normal (Gaussian) Distribution Bell-shaped distribution with tendency for individuals to clump around the group median/mean Used to model many biological phenomena Many estimators have approximate normal sampling distributions (see Central Limit Theorem) Notation: Y~N(m,s2) where m is mean and s2 is variance Obtaining Probabilities and Quantiles in R: To obtain: F(y)=P(Y≤y) Use Function: pnorm(y,m,s) To obtain the pth quantile: P(Y≤yp)=p Use Function: qnorm(p, m,s) Virtually all statistics textbooks give the cdf (or upper tail probabilities) for standardized normal random variables: z=(y-m)/s ~ N(0,1)
Normal Distribution – Density Functions (pdf)
Second Decimal Place of z Integer part and first decimal place of z
Chi-Square Distribution Indexed by “degrees of freedom (n)” X~cn2 Z~N(0,1) Z2 ~c12 Assuming Independence: Obtaining Probabilities in R: To obtain: 1-F(x)=P(X≥x) Use Function: pchisq(x,n) To obtain quantiles: P(X≤xp)=p Use Function: qchisq(x,n) Virtually all statistics textbooks give upper tail cut-off values for commonly used upper (and sometimes lower) tail probabilities
Chi-Square Distributions
Critical Values for Chi-Square Distributions (Mean=n, Variance=2n)
Student’s t-Distribution Indexed by “degrees of freedom (n)” X~tn Z~N(0,1), X~cn2 Assuming Independence of Z and X: Obtaining Probabilities /Quantiles in EXCEL: To obtain: F(t)=P(T≤t) pt(t,n) To obtain: pth quantile: qt(p,n) Virtually all statistics textbooks give upper tail cut-off values for commonly used upper tail probabilities
Critical Values for Student’s t-Distributions E(T)=0 (n>1) V(T)=n/(n-2)
F-Distribution Indexed by 2 “degrees of freedom (n1,n2)” W~Fn1,n2 X1 ~cn12, X2 ~cn22 Assuming Independence of X1 and X2: Obtaining Probabilities/Quantiles in R: To obtain: F(w)=P(W≤w): pf(w,n1,n2) pth quantile: qf(p,n1,n2) Virtually all statistics textbooks give upper tail cut-off values for commonly used upper tail probabilities
Critical Values for F-distributions P(F ≤ Table Value) = 0.95
Multivariate Normal Distribution
Results Involving Multivariate Normal - I
Results Involving Multivariate Normal - II
Results Involving Multivariate Normal - III
Multivariate Normal Likelihood Function
Maximum Likelihood Estimator of m
Maximum Likelihood Estimator of S
Results for ML Estimators and Large-Sample Properties
Data – Heights of Adult Children and Parents Adult Children Heights are reported by inch, in a manner so that the median of the grouped values is used for each (62.2”,…,73.2” are reported by Galton). He adjusts female heights by a multiple of 1.08 We use 61.2” for his “Below” We use 74.2” for his “Above” Mid-Parents Heights are the average of the two parents’ heights (after female adjusted). Grouped values at median (64.5”,…,72.5” by Galton) We use 63.5” for “Below” We use 73.5” for “Above”
Joint Density Function m1=m2=0 s1=s2=1 r=0.4
Marginal Distribution of Y (p. 1)
Marginal Distribution of Y (p. 2)
Conditional Distribution of Y2 Given Y1=y1 (P. 1)
Conditional Distribution of Y2 Given Y1=y1 (P. 2) This is referred to as the REGRESSION of Y2 on Y1
Summary of Results
Heights of Adult Children and Parents Empirical Data Based on 924 pairs (F. Galton) Y2 = Adult Child’s Height Y2 ~ N(68.1,6.39) s2=2.53 Y1 = Mid-Parent’s Height Y1 ~ N(68.3,3.18) s1=1.78 COV(Y1,Y2) = 2.02 r = 0.45, r2 = 0.20 Y2|Y1=y1 is Normal with conditional mean and variance: y1 Unconditional 63.5 66.5 69.5 72.5 E[Y2|y1] 68.1 65.0 66.9 68.8 70.8 sY2|y1 2.53 2.26
E(Child)= Parent+constant Galton’s Finding E(Child) independent of parent
Expectations and Variances E(Y1) = 68.3 V(Y1) = 3.18 E(Y2) = 68.1 V(Y2) = 6.39 E(Y2|Y1=y1) = 24.5+0.638y1 EY1[E(Y2|Y1=y1)] = EY1[24.5+0.638Y1] = 24.5+0.638(68.3) = 68.1 = E(Y2) V(Y2|Y1=y1) = 5.11 EY1[V(Y2|Y1=y1)] = 5.11 VY1[E(Y2|Y1=y1)] = VY1[24.5+0.638Y1] = (0.638)2 V(Y1) = (0.407)3.18 = 1.29 EY1[V(Y2|Y1=y1)]+VY1[E(Y2|Y1=y1)] = 5.11+1.29=6.40 = V(Y2) (with round-off)