1 Multivariable Distributions ch4
2 It may be favorable to take more than one measurement on a random experiment. The data may then be collected in pairs of (x i, y i ). Def.4.1-1: X & Y are two discrete R.V. defined over the support S. The probability that X=x, Y=y is denoted as f(x,y)=P(X=x,Y=y). f(x,y) is the joint probability mass function (joint p.m.f.) of X and Y: 0≤f(x,y)≤1; ΣΣ (x,y) ∈ S f(x,y)=1; P[(X,Y) ∈ A]=ΣΣ (x,y) ∈ A f(x,y), A ⊆ S.
3 Illustration Example Ex.4.1-3: Roll a pair of dice: X is the smaller and Y is the larger. The outcome is (3, 2) or (2, 3) ⇒ X=2 & Y=3 with 2/36 probability. The outcome is (2, 2) ⇒ X=2 & Y=2 with 1/36 probability. Thus, the joint p.m.f. of X and Y is 11/3662/362/362/362/362/361/369/3652/362/362/362/361/36 7/3642/362/362/361/36 5/3632/362/361/36 3/3622/361/36 1/3611/ /369/367/365/363/361/36 Marginal p.m.f. y x
4 Marginal Probability and Independence Def.4.1-2: X and Y have the joint p.m.f f(x,y) with space S. The marginal p.m.f. of X is f 1 (x)=Σ y f(x,y)=P(X=x), x ∈ S 1. The marginal p.m.f. of Y is f 2 (y)=Σ x f(x,y)=P(Y=y), y ∈ S 2. X and Y are independent iff P(X=x, Y=y)=P(X=x)P(Y=y), namely, f(x,y)=f 1 (x)f 2 (y), x ∈ S 1, y ∈ S 2. Otherwise, X and Y are dependent. X and Y in Ex5.1-3 are dependent: 1/36=f(1,1) ≠ f 1 (1)f 2 (1)=11/36*1/36. Ex4.1-4: The joint p.m.f. f(x,y)=(x+y)/21, x=1,2,3, y=1,2. Then, f 1 (x)=Σ y=1~2 (x+y)/21=(2x+3)/21, x=1,2,3. Likewise, f 2 (1)=Σ x=1~3 (x+y)/21=(6+3y)/21, y=1,2. Since f(x,y)≠f 1 (x)f 2 (y), X and Y are dependent. Ex4.1-6: f(x,y)=xy 2 /13, (x,y)=(1,1),(1,2),(2,2).
5 Quick Dependence Checks Practically, “dependence” can be quickly determined if The support of X and Y is NOT rectangular, or S is therefore not the product set {(x,y): x ∈ S 1, y ∈ S 2 }, as in Ex f(x,y) cannot be factored (separated) into the product of an x-alone expression and a pure y function. In Ex4.1-4, f(x,y) is a sum, not a product, of x-alone and y-alone functions. Ex4.1-7: [Probability Histogram for a joint p.m.f.]
6 Mathematical Expectation If u(X 1,X 2 ) is a function of two R.V. X 1 & X 2, then if it exists, is called the mathematical expectation (or expected value) of u(X 1,X 2 ). The mean of Xi, i=1,2: The variance of X i : Ex4.1-8: A player selects a chip from a bowl having 8 chips: 3 marked (0,0), 2 (1,0), 2 (0,1), 1 (1,1).
7 Probability Density Function Joint Joint Probability Density Function, joint p.d.f., of two continuous-type R.V. X & Y, is an integrable function f(x,y): f(x,y)≥0; ∫ y=-∞~∞ ∫ x=-∞~∞ f(x,y)dxdy=1; P[(X,Y) ∈ A]=∫∫ A f(x,y)dxdy, for an event A. Ex4.1-9: X and Y have the joint p.d.f. A={(x,y): 0<x<1, 0<y<x}. The respective marginal p.d.f.s are X and Y are independent!
8 Independence of Continuous Type R.V.s Two continuous type R.V. X and Y are independent iff the joint p.d.f. factors into the product of their marginal p.d.f.s. Ex4.1-10: X and Y have the joint p.d.f. The support S={(x,y): 0≤x≤y≤1}, bounded by x=0, y=1, x=y lines. The marginal p.d.f.s are Various expected values: X and Y are dependent!
9 Multivariate Hypergeometric Distribution Ex4.1-11: Of 200 students, 40 have As, 60 Bs; 100 Cs, Ds, or Fs. A sample of size 25 is taken at random without replacement. X 1 is the number of A students, X 2 is the number of B students, and 25 –X 1 –X 2 is the number of the other students. The space S = {(x 1,x 2 ): x 1,x 2 ≥0, x 1 +x 2 ≤25}. The marginal p.m.f. of X 1 can be also obtained as: X 1 and X 2 are dependent! From the knowledge of the model.
10 Binomial ⇒ Trinomial Distribution Trinomial Distribution: The experiment is repeated n times. The probability p 1 : perfect, p 2 : second; p 3 : defective, p 3 =1-p 1 -p 2. X 1 : the number of perfect items, X 2 for second, X 3 for defective. The joint p.m.f. is X 1 is b(n,p 1 ), X 2 is b(n,p 2 ); both are dependent. Ex4.1-13: In manufacturing a certain item, 95% of the items are good; 4% are “seconds”, and 1% defective. An inspector observes n=20 items selected at random, counting the number X of seconds, and the number Y of defectives. The probability that at least 2 seconds or at least 2 defective items are found, namely A={(x,y): x≥2 or y≥2}, is
11 Correlation Coefficient For two R.V. X 1 & X 2, The mean of X i, i=1,2: The variance of X i : The covariance of X 1 & X 2 is The correlation coefficient of X 1 & X 2 is Ex4.2-1: X 1 & X 2 have the joint p.m.f. → Not a product ⇒ Dependent!
12 Insights of the Meaning of ρ Among all points in S, ρ tends to be positive if more points are simultaneously above or below their respective means with larger probability. The least-squares regression line is a line passing given (μ x,μ y ) with the best slope b s.t. K(b)=E{[(Y-μ y )-b(X-μ x )] 2 } is minimized. The square of the vertical distance from a point to the line. ρ= ±1: K(b)=0 ⇒ all the points lie on the least-squares regression line. ρ= 0: K(b)=σ y 2, the line is y=μ y ; X and Y could be independent!! ρmeasures the amount of linearity in the probability distribution.
13 Example Ex4.2-2: Roll a pair of 4-sided die: X is the number of ones, Y is the number of twos and threes. The joint p.m.f. is The line of best fit is
14 Independence ⇒ ρ=0 The converse is not necessarily true! Ex4.2-3: The joint p.m.f. of X and Y is f(x,y)=1/3, (x,y)=(0,1), (1,0), (2,1). Obviously, the support is not “rectangular”, so X and Y are dependent. Empirical Data: from n bivariate observations: (x i,y i ), i=1..n. We can compute the sample mean and variance for each variate. We can also compute the sample correlation coefficient and the sample least squares regression line. We can also compute the sample correlation coefficient and the sample least squares regression line. (Ref. p.241) ∵ independence
15 Conditional Distributions Def.4.3-1: The conditional probability mass function of X, given that Y=y, is defined by g(x|y)=f(x,y)/f 2 (y), if f 2 (y)>0. Likewise, h(y|x)=f(x,y)/f 1 (x), if f 1 (x)>0. Ex.4.3-1: X and Y have the joint p.m.f f(x,y)=(x+y)/21, x=1,2,3; y=1,2. f 1 (x)=(2x+3)/21, x=1,2,3; f 2 (y)=(3y+6)/21, y=1,2. Thus, given Y=y, the conditional p.m.f. of X is When y=1, g(x|1)=(x+1)/9, x=1,2,3; g(1|1):g(2|1):g(3|1)=2:3:4. When y=2, g(x|2)=(x+2)/12, x=1,2,3; g(1|2):g(2|2):g(3|2)=3:4:5. Similar relationships about h(y|x) can be obtained. Dependent!
16 Conditional Mean and Variance The conditional mean of Y, given X=x, is The conditional variance of Y, given X=x, is Ex.4.3-2: [from Ex.4.3-1] X and Y have the joint p.m.f f(x,y)=(x+y)/21, x=1,2,3; y=1,2.
17 Relationship about Conditional Mean The point (μ X,μ Y ) locates on the above two lines, and is their junction. The product of the slopes is ρ 2. The ratio of the slopes is These relations can derive the unknown from the others known.
18 Example Ex.4.3-3: X and Y have the trinomial p.m.f. with n, p 1, p 2, p 3 =1-p 1 -p 2 They have the marginal p.m.f. b(n, p 1 ), b(n, p 2 ), so
19 Example for Continuous-type R.V. Ex4.3-5: [From Ex4.1-10] ⇒ The conditional distribution of Y given X=x is U(x,1). [U(a,b) has mean (b+a)/2, and variance (b-a) 2 /12.]
20 Bivariate Normal Distribution The joint p.d.f of X : N(μ X,σ X 2 )and Y : N(μ Y,σ Y 2 ) is Therefore, A linear function of x.A constant w.r.t. x.
21 Examples Ex.5.6-1: Ex.5.6-2
22 Bivariate Normal:ρ=0 ⇒ Independence Thm5.6-1: For X and Y with a bivariate normal distribution with ρ, X and Y are independent iffρ=0. So are trivariate and multivariate normal distributions. When ρ=0,
23 Transformations of R.V.s In Section 3.5, the transformation of a single variable X with f(x) to another Y=v(X), an increasing or decreasing fn, can be done as: Ex.4.4-1: X: b(n,p), Y=X 2, if n=3, p=1/4, then What is the transformation u(X/n) leading to a variance free of p? Taylor’s expansion about p: Ex: X: b(100,1/4) or b(100,9/10). Continuous type Discrete type When the variance is constant, or free of p,
24 Multivariate Transformations When the function Y=u(X) does not have a single-valued inverse, it needs to consider possible inverse functions individually. Each range will be delimited to match the right inverse. For multivariate, the derivative is replaced by the Jacobian. Continuous R.V. X 1 and X 2 have the joint p.d.f. f(x 1, x 2 ). If has the single-valued inverse then the joint p.d.f. of Y 1 and Y 2 is [Most difficult] The mapping of the supports are considered.
25 Transformation to the Independent Ex4.4-2: X 1 and X 2 have the joint p.d.f. f(x 1, x 2 )=2, 0<x 1 <x 2 <1. Consider Y 1 =X 1 /X 2, Y 2 =X 2 : The mapping of the supports: The marginal p.d.f.: ∵ g(y 1,y 2 )=g 1 (y 1 )g 2 (y 2 ) ∴ Y 1,Y 2 Independent. →
26 Transformation to the Dependent Ex4.4-3: X 1 and X 2 are indep., each with p.d.f. f(x)=e -x, 0<x<∞. Their joint p.d.f. f(x 1, x 2 )= e -x1 e -x2, 0<x 1 <∞, 0<x 2 <∞. Consider Y 1 =X 1 -X 2, Y 2 =X 1 -X 2 : The mapping of the supports: The marginal p.d.f.: ∵ g(y 1,y 2 ) ≠g 1 (y 1 )g 2 (y 2 ) ∴ Y 1,Y 2 Dependent. → Double exponential p.d.f.
27 Beta Distribution Ex4.4-4: X 1 and X 2 have indep. Gamma distributions withα,θ and β, θ. Their joint p.d.f. is Consider Y 1 =X 1 /(X 1 +X 2 ), Y 2 =X 1 +X 2 : i.e., X 1 =Y 1 Y 2, X 2 =Y 2 -Y 1 Y 2. The marginal p.d.f.: ∵ g(y 1,y 2 )=g 1 (y 1 )g 2 (y 2 ) ∴ Y 1,Y 2 Independent. Beta p.d.f. Gamma p.d.f.
28 Box-Muller Transformation Box-Muller Transformation Ex5.3-4: X 1 and X 2 have indep. Uniform distributions U(0,1). Consider Two indep. U(0,1) ⇒ two indep. N(0,1)!!
29 Distribution Function Technique Ex.5.3-5: Z is N(0,1), U is χ 2 (r), Z and U are independent. The joint p.d.f. of Z and U is χ 2 (r+1)
30 Another Example Ex.4.4-5: U: χ 2 (r 1 ) and V: χ 2 (r 2 ) are independent. The joint p.d.f. of Z and U is The knowledge of known distributions and their associated integration relationships are useful to derive the distributions of unknown distributions. χ 2 (r 1 +r 2 )
31 Order Statistics The order statistics are the observations of the random sample arranged in magnitude from the smallest to the largest. Assume there is no tie: identical observations. Ex6.9-1: n=5 trials: {0.62, 0.98, 0.31, 0.81, 0.53} for the p.d.f. f(x)=2x, 0<x<1. The order statistics are {0.31, 0.53, 0.62, 0.81, 0.98}. The sample median is 0.62, and the sample range is =0.67. Ex6.9-2: Let Y 1 <Y 2 <Y 3 <Y 4 <Y 5 be the order statistics for X 1, X 2, X 3, X 4, X 5, each from the p.d.f. f(x)=2x, 0<x<1. Consider P(Y 4 <1/2) ≡at least 4 of X i ’s must be less than 1/2: 4 successes.
32 General Cases The event that the rth order statistic Y r is at most y, {Y r ≤y}, can occur iff at least r of the n observations are no more than y. The probability of “success” on each trial is F(y). We must have at least r successes. Thus,
33 Alternative Approach A heuristic approach to obtain g r (y): Within a short interval Δy: There are (r-1) items fall less than y, and (n-r) items above y+Δy. The multinomial probability with n trials is approximated as. Ex5.9-3: (from Ex6.9-2) Y 1 <Y 2 <Y 3 <Y 4 <Y 5 are the order statistics for X 1, X 2, X 3, X 4, X 5, each from the p.d.f. f(x)=2x, 0<x<1. On a single trial
34 More Examples Ex: 4 indep. Trials(Y 1 ~ Y 4 ) from a distribution with f(x)=1, 0<x<1. Find the p.d.f. of Y 3. Ex: 7 indep. trials(Y 1 ~ Y 7 ) from a distribution f(x)=3(1-x) 2, 0<x<1. Find the p.d.f. of the sample median, i.e. Y 4, is less than Method 1: find g 4 (y), then Method 2: find then By Table II on p.647.
35 Order Statistics of Uniform Distributions Thm3.5-2: if X has a distribution function F(X), which has U(0,1). {F(X 1 ),F(X 2 ),…,F(X n )} ⇒ Wi’s are the order statistics of n indep. observations from U(0,1). The distribution function of U(0,1) is G(w)=w, 0<w<1. The p.d.f. of the rth order statistic W r =F(Y r ) is ⇒ Y’s partition the support of X into n+1 parts, and thus n+1 areas under f(x) and above the x-axis. Each area equals 1/(n+1) on the average. p.d.f. Beta
36 Percentiles Percentiles The (100p) th sample percentile π p is defined s.t. the area under f(x) to the left of π p is p. Therefore, Y r is the estimator of π p, where r=(n+1)p. In case (n+1)p is not an integer, a (weighted) average of Y r and Y r+1 can be used, where r=floor[(n+1)p]. The sample median is Ex6.9-5: X is the weight of soap; n=12 observations of X is listed: 1013, 1019, 1021, 1024, 1026, 1028, 1033, 1035, 1039, 1040, 1043, ∵ n=12, the sample median is ∵ (n+1)(0.25)=3.25, the 25 th percentile or first quartile is ∵ (n+1)(0.75)=9.75, the 75 th percentile or third quartile is ∵ (n+1)(0.6)=7.8, the 60 th percentile
37 Another Example Ex: The order statistics of 13 indep. Trials(Y 1 <Y 2 < …< Y 13 ) from a continuous type distribution with the 35 th percentile π Ex5.6-7: The order statistics of 13 indep. Trials(Y 1 <Y 2 < …< Y 13 ) from a continuous type distribution with the 35 th percentile π Find P(Y 3 < π 0.35 < Y 7 ) The event {Y 3 < π 0.35 < Y 7 } happens iff there are at least 3 but less than 7 “successes”, where the success probability is p=0.35. By Table II on p.677~681. Success