Probability overview Event space – set of possible outcomes Frequentist: run an experiment to get an outcome. Probability describes what happens on multiple trials. Bayesian: one outcome is correct, but we don’t have enough info to choose. Prob describes the degree of belief of each outcome based on what we know now. Random variable (RV) – a function from event space to R or Rn. E.g., number of heads in 10 tosses of coin, inches of rain Sometimes use the image of the RV as the event space
Event spaces Discrete event space: W = {w1, w2, …} PMF: pj ≥ 0, ∑pj = 1, pj gives prob of wj (0 ≤ pj ≤ 1) Continuous event space: W a subset of R or Rn PDF: p(x) ≥ 0, ∫W p(x)dx = 1, P(A) = ∫A p(x)dx gives the prob of events in A. (p(x) may be > 1) E.g., uniform distribution on [a,b], p(x) = 1/(b-a), a < x < b Discrete events in continuous space: Dirac 𝛿 ∫W 𝛿(x)dx = 1 whenever W contains an open interval containing 0. p(x) = ∑1 ≤i ≤m pi𝛿(x – xi) has PMF p(xi) = pi
Mean and variance Where is the center of mass of a RV, X? Expected value or mean Weighted average of outcomes, weighted by probability 𝜇(X) = E(X) = ∫W x p(x)dx If a,b real, then E(aX + bY) = aE(X) + bE(Y) (linear) How much is the mass spread out? Variance and standard deviation Squared dist from mean, weighted by probability 𝜎2(X) = Var(X) = E((X - 𝜇(X))2) 𝜎(X) (square root of Var(X)) has same units as X
Joint density Given two RVs X and Y: P(x,y) or P(X=x, Y=y) is the joint density. Independence: P(x,y) = P(x)P(y) for all x,y (a.e) This gives a product structure on joint distribution P(yi) P(xj) Roughly: info about x gives no info about y, and vice versa P(xj)P(yi)
Joint density Given two RVs X and Y: Marginal probability: P(x,y) or P(X=x, Y=y) is the joint density. Marginal probability: P(x) = ∫W(Y) P(x,y) dy Integrate out the effect of y (or x) (sum along rows/cols) P(yi) P(xj) P(xj,yi)
Joint density Conditional probability: P(x,y) = P(x|y) P(y) or P(x|y) = P(x,y)/P(y) P(x|y) = P(x,y) / ∫W(X) P(x,y) dx E.g., fix y= yi and normalize to get a PDF P(yi) P(xj,yi)/P(yi)
Covariance/correlation Cov(X,Y) = E((X - 𝜇(X))(Y - 𝜇(Y)) (X - 𝜇(X))(Y - 𝜇(Y)) is a RV from W(X) x W(Y) to R Measures direction and magnitude of common mvmt Really an inner product of X - 𝜇(X) and Y - 𝜇(Y): <f,g> = ∫W(X)xW(Y) f(x)g(y) p(x,y) dx dy Correlation: Corr(X,Y) = Cov(X,Y) / (𝜎(X) 𝜎(Y)) = <f,g>/ (∣∣f ∣∣ ∣∣g ∣∣) (between -1 and 1 by Cauchy Schwarz)
Bayes’ Rule Use P(x,y) = P(x|y) P(y) and P(x,y) = P(y|x) P(x) to get P(x|y) = P(y|x) P(x) / P(y) E.g., y = x-ray measurements from an object x P(x|y) is difficult, but P(y|x) is easy, and P(x) encodes our beliefs about x Solve x* = argminx ( - log P(y|x) - log P(x) )