Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Probability Slides from Professor Pan,Yan, SYSU.

Similar presentations


Presentation on theme: "Intro to Probability Slides from Professor Pan,Yan, SYSU."— Presentation transcript:

1 Intro to Probability Slides from Professor Pan,Yan, SYSU

2 Probability Theory Example of a random experiment –We poll 60 users who are using one of two search engines and record the following: X 012345678 Each point corresponds to one of 60 users Two search engines Number of “good hits” returned by search engine

3 X 012345678 Probability Theory Random variables –X and Y are called random variables –Each has its own sample space: S X = {0,1,2,3,4,5,6,7,8} S Y = {1,2}

4 X 012345678 Probability Theory Probability –P(X=i,Y=j) is the probability (relative frequency) of observing X = i and Y = j –P(X,Y) refers to the whole table of probabilities –Properties: 0 ≤ P ≤ 1,  P = 1 368853100 60 000145862 P(X=i,Y=j)P(X=i,Y=j)

5 Probability Theory Marginal probability –P(X=i) is the marginal probability that X = i, ie, the probability that X = i, ignoring Y X 012345678 P(X)P(X) P(Y)P(Y)

6 Probability Theory Marginal probability –P(X=i) is the marginal probability that X = i, ie, the probability that X = i, ignoring Y –From the table: P(X=i) =  j P(X=i,Y=j) Note that  i P(X=i) = 1 and  j P(Y=j) = 1 X 012345678 368853100 60 000145862 34 60 26 60 P(Y=j)P(Y=j) 368998962 P(X=i)P(X=i) SUM RULE

7 Probability Theory Conditional probability –P(X=i|Y=j) is the probability that X=i, given that Y = j –From the table: P(X=i|Y=j) =  P(X=i,Y=j) / P(Y=j) X 012345678 P(X|Y=1) P(Y=1)

8 Probability Theory Conditional probability –How about the opposite conditional probability, P(Y=j|X=i)? –P(Y=j|X=i) =  P(X=i,Y=j) / P(X=i) Note that  j P(Y=j|X=i)=1 X 012345678 368853100 60 000145862 X 012345678 3 3 0 3 6 6 0 6 8 8 0 8 8 9 1 9 5 9 4 9 3 8 5 8 1 9 8 9 0 6 6 6 0 2 2 2 P(Y=j|X=i)P(Y=j|X=i) P(X=i,Y=j)P(X=i,Y=j) 368998962 P(X=i)P(X=i)

9 Summary of types of probability Joint probability: P(X,Y) Marginal probability (ignore other variable): P(X) and P(Y) Conditional probability (condition on the other variable having a certain value): P(X|Y) and P(Y|X)

10 Probability Theory Constructing joint probability –Suppose we know The probability that the user will pick each search engine, P(Y=j), and For each search engine, the probability of each number of good hits, P(X=i|Y=j) –Can we construct the joint probability, P(X=i,Y=j) ? –Yes. Rearranging P(X=i|Y=j) =  P(X=i,Y=j) / P(Y=j) we get P(X=i,Y=j) =  P(X=i|Y=j) P(Y=j) PRODUCT RULE

11 Summary of computational rules SUM RULE: P(X) =  Y P(X,Y) P(Y) =  X P(X,Y) –Notation: We simplify P(X=i,Y=j) for clarity PRODUCT RULE: P(X,Y) = P(X|Y) P(Y) P(X,Y) = P(Y|X) P(X)

12 Ordinal variables In our example, X has a natural order 0…8 –X is a number of hits, and –For the ordering of the columns in the table below, nearby X ’s have similar probabilities Y does not have a natural order X 012345678

13 Probabilities for real numbers Can’t we treat real numbers as IEEE DOUBLES with 2 64 possible values? Hah, hah. No! How about quantizing real variables to reasonable number of values? Sometimes works, but… –We need to carefully account for ordinality –Doing so can lead to cumbersome mathematics

14 Probability theory for real numbers Quantize X using bins of width  Then, X  {.., -2 , - , 0, , 2 ,..} Define P Q (X=x) = Probability that x  X ≤ x+  Problem: P Q (X=x) depends on the choice of  Solution: Let   0 Problem: In that case, P Q (X=x)  0 Solution: Define a probability density P(x) = lim  0 P Q (X=x) /  = lim  0 (Probability that x  X ≤ x+  ) / 

15 Probability theory for real numbers Probability density –Suppose P(x) is a probability density –Properties P(x)  0 It is NOT necessary that P(x) ≤ 1  x P(x) dx = 1 –Probabilities of intervals: P(a  X ≤ b) =  b x=a P(x) dx

16 Probability theory for real numbers Joint, marginal and conditional densities Suppose P(x,y) is a joint probability density –  x  y P(x,y) dx dy = 1 – P( (X,Y)  R) =  R P(x,y) dx dy Marginal density: P(x) =  y P(x,y) dy Conditional density: P(x|y) = P(x,y) / P(y) x y R

17 The Gaussian distribution  is the standard deviation

18 Mean and variance The mean of X is E[X] =  X X P(X) or E[X] =  x x P(x) dx The variance of X is VAR(X) =  X ( X-E[X] ) 2 P(X) or VAR(X) =  x ( x - E[X] ) 2 P(x)dx The std dev of X is STD(X) = SQRT(VAR(X)) The covariance of X and Y is COV(X,Y) =  X  Y ( X-E[X] ) ( Y-E[Y] ) P(X,Y) or COV(X,Y) =  x  y ( x-E[X] ) ( y-E[Y] ) P(x,y) dx dy

19 Mean and variance of the Gaussian E[X] =  VAR(X) =  2 STD(X) = 

20 How can we use probability as a framework for machine learning?

21 Maximum likelihood estimation Say we have a density P(x|  ) with parameter  The likelihood of a set of independent and identically drawn (IDD) data x = (x 1,…,x N ) is P( x |  ) =  n=1 N P(x n |  ) The log-likelihood is L = ln P( x |  ) =  n=1 N lnP(x n |  ) The maximum likelihood (ML) estimate of  is  ML = argmax  L = argmax   n=1 N ln P(x n |  ) Example: For Gaussian likelihood P(x|  ) = N (x| ,  2 ), L =

22 Comments on notation from now on Instead of  j P(X=i,Y=j), we write  X P(X,Y) P() and p() are used interchangeably Discrete and continuous variables treated the same, so  X,  X,  x and  x are interchangeable  ML and  ML are interchangeable argmax  f(  ) is the value of  that maximizes f(  ) In the context of data x 1,…,x N, symbols x, X, X and X refer to the entire set of data N (x| ,  2 ) = log() = ln() and exp(x) = e x p context (x) and p(x|context) are interchangable

23 Maximum likelihood estimation Say we have a density P(x|  ) with parameter  The likelihood of a set of independent and identically drawn (IDD) data x = (x 1,…,x N ) is P( x |  ) =  n=1 N P(x n |  ) The log-likelihood is L = ln P( x |  ) =  n=1 N lnP(x n |  ) The maximum likelihood (ML) estimate of  is  ML = argmax  L = argmax   n=1 N ln P(x n |  ) Example: For Gaussian likelihood P(x|  ) = N (x| ,  2 ), L =

24 Questions?

25 Maximum likelihood estimation Say we have a density P(x|  ) with parameter  The likelihood of a set of independent and identically drawn (IDD) data x = (x 1,…,x N ) is P( x |  ) =  n=1 N P(x n |  ) The log-likelihood is L = ln P( x |  ) =  n=1 N lnP(x n |  ) The maximum likelihood (ML) estimate of  is  ML = argmax  L = argmax   n=1 N ln P(x n |  ) Example: For Gaussian likelihood P(x|  ) = N (x| ,  2 ), L =

26 Maximum likelihood estimation L = Example: For Gaussian likelihood P(x|  ) = N (x| ,  2 ), Objective of regression: Minimize error E(w) = ½  n ( t n - y(x n,w) ) 2


Download ppt "Intro to Probability Slides from Professor Pan,Yan, SYSU."

Similar presentations


Ads by Google