Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability for Machine Learning

Similar presentations


Presentation on theme: "Probability for Machine Learning"— Presentation transcript:

1 Probability for Machine Learning
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

2 Probabilistic Machine Learning
Not all machine learning models are probabilistic … but most of them have probabilistic interpretations Predictions need to have associated confidence Confidence = probability Arguments for probabilistic approach Complete framework for Machine Learning Makes assumptions explicit Recovers most non-probabilistic models as special cases Modular: Easily extensible Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

3 References “Introduction to Probability Models”, Sheldon Ross
“Introduction to Probability and Statistics for Engineers and Scientists”, Sheldon Ross “Introduction To Probability”, Dimitri P. Bertsekas, John N. Tsitsiklis Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

4 Basics Random experiment 𝐸, outcome 𝜔∈Ω, events 𝐹, sample space (Ω,𝐹)
Probability measure 𝑃:𝐹→𝑅 Axioms of probability, basic laws of probability Discrete sample space, discrete probability measure Continuous sample space, continuous probability measure Conditional probability, multiplicative rule, theorem of total probability, Bayes theorem Independence, pair-wise, mutual, conditional independence Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

5 Random Variables 𝑋:Ω→𝑅 Example: Experiment: Tossing of two coins
Random variable: sum of two outcomes 𝑋=2 ≡ 𝜔:𝑠𝑢𝑚 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠=2 = 1,1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

6 Discrete Random Variables
Probability mass function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

7 Example distributions: Discrete
Bernoulli: 𝑥∼𝐵𝑒𝑟 𝑝 , 𝑥∈{0,1}≡𝑝 𝑥 = 𝑝 𝑥 1−𝑝 1−𝑥 Binomial: 𝑥∼𝐵𝑖𝑛 𝑛,𝑝 , 𝑥∈{0,…,𝑛}≡𝑝 𝑥 =𝑛𝐶𝑥 𝑝 𝑥 1−𝑝 1−𝑥 Poisson: 𝑥∼𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝜆 , 𝑥∈{0,1, …}≡𝑝 𝑥 = 𝑒 −𝜆 𝜆 𝑘 𝑘! Geometric: 𝑥∼𝐺𝑒𝑜 𝑝 , 𝑥∈{1,…,𝑛}≡𝑝 𝑥 = 1−𝑝 𝑥−1 𝑝 Empirical distribution: Given 𝐷= 𝑥 1 ,…, 𝑥 𝑛 , 𝑝 𝑒𝑚𝑝 𝐴 = 1 𝑁 𝑖 𝛿 𝑥 𝑖 (𝐴) , where 𝛿 𝑥 𝑖 (𝐴) is the Dirac delta measure Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

8 Continuous Random Variables
Probability density function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

9 Example density functions
Uniform: 𝑥∼𝑈 𝑎,𝑏 ≡𝑓 𝑥 = 1 𝑏−𝑎 Exponential: 𝑥∼𝐸𝑥𝑝 𝜆 ≡𝑓 𝑥 =𝜆 𝑒 −𝜆𝑥 Standard Normal: 𝑥∼𝑁 0,1 ≡𝑓 𝑥 = 1 √2𝜋 𝑒 − 𝑥 2 /2 Gaussian: 𝑥∼𝑁(𝜇,𝜎)≡𝑓 𝑥 = 1 √2𝜋𝜎 𝑒 −( 𝑥−𝜇) 2 /2 𝜎 2 Laplace: 𝑥∼𝐿𝑎𝑝(𝜇,𝑏)≡𝑓 𝑥 = 1 2𝑏 𝑒 −|𝑥−𝜇|/𝑏 Gamma: 𝑥∼𝐺𝑎𝑚(𝛼,𝛽)≡𝑓 𝑥 = 𝛽 𝛼 Γ(𝛼) 𝑥 𝛼−1 𝑒 −𝛽𝑥 Beta: 𝑥∼𝐵𝑒𝑡𝑎(𝛼,𝛽)≡𝑓 𝑥 = Γ 𝛼 Γ(𝛽) Γ 𝛼+𝛽 𝑥 𝛼−1 (1−𝑥) 𝛽−1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

10 Random Variables Cumulative distribution function
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

11 Moments Mean Variance 𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 𝑉𝑎𝑟 𝑋 =𝐸[ 𝑋−𝐸 𝑋 2 ]
𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 Variance 𝑉𝑎𝑟 𝑋 =𝐸[ 𝑋−𝐸 𝑋 2 ] Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

12 Random Vectors and Joint Distributions
Discrete Random Vector Joint pmf Continuous Random Vector Joint pdf Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

13 Example multi-variate distributions
Multi-variate Gaussian 𝑥∼𝑁 𝜇,Σ ≡𝑓 𝑥 = 2𝜋 − 𝑘 2 Σ −1 𝑥−𝜇 𝑇 Σ −1 (𝑥−𝜇) Multinomial 𝑥∼𝑀𝑢𝑙𝑡 𝑝 1 ,…, 𝑝 𝑘 ≡𝑓 𝑥 1 ,…, 𝑥 𝑘 = 𝑛! 𝑥 1 !… 𝑥 𝑘 ! 𝑝 1 𝑥 1 … 𝑝 𝑘 𝑥 𝑘 Dirichlet 𝑥∼𝐷𝑖𝑟 𝛼 1 ,…, 𝛼 𝑘 ≡𝑓 𝑥 1 ,…, 𝑥 𝑘 = Γ 𝑖 𝛼 𝑖 𝑖 Γ 𝛼 𝑖 𝑖 𝑥 𝑖 𝛼 𝑖 −1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

14 Random Vectors and Joint Distributions
Given 𝑓( 𝑥 1 ,… 𝑥 𝑘 ), Marginal distributions 𝑓 𝑋 1 𝑥 1 = 𝑥 𝑥 3 … 𝑓 𝑥 1 ,…, 𝑥 𝑘 𝑑 𝑥 2 𝑑 𝑥 3 … Expectation 𝐸[𝑋]= 𝑥 𝑥 2 … ( 𝑥 1 ,…, 𝑥 𝑘 )𝑓 𝑥 1 ,…, 𝑥 𝑘 𝑑 𝑥 1 𝑑 𝑥 2 … Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

15 Conditional Probability
Conditional pmf Conditional pdf Given 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 ), 𝑓 𝑋 1 | 𝑋 2 𝑥 1 𝑥 2 = 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 )/ 𝑓 𝑋 2 ( 𝑥 2 ) Multiplication Rule Bayes rule 𝑓 𝑋 1 | 𝑋 2 𝑥 1 𝑥 2 = 𝑓 𝑋 2 | 𝑋 1 𝑥 2 𝑥 1 𝑓 𝑋 1 𝑥 𝑥 1 𝑓 𝑋 2 | 𝑋 1 𝑥 2 𝑥 1 𝑓 𝑋 1 𝑥 1 𝑑 𝑥 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

16 Conditional Probability
Given 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 ), Conditional Expectation 𝐸 𝑋 1 𝑥 2 = 𝑥 1 𝑓 𝑋_1| 𝑋 2 𝑥 1 𝑥 2 𝑑 𝑥 1 Law of Total Expectation 𝐸 𝑋 1 = 𝐸 𝑋 1 𝑥 2 𝑓 𝑋 2 𝑥 2 𝑑 𝑥 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

17 Independence and Conditional Independence
𝑓 𝑋 1 𝑋 2 𝑥 1 , 𝑥 2 = 𝑓 𝑋 1 ( 𝑥 1 ) 𝑓 𝑋 2 ( 𝑥 2 ) Conditional Independence 𝑓 𝑋 1 𝑋 2 | 𝑋 3 𝑥 1 , 𝑥 2 | 𝑥 3 = 𝑓 𝑋 1 | 𝑋 3 ( 𝑥 1 | 𝑥 3 ) 𝑓 𝑋 2 | 𝑋 3 ( 𝑥 2 | 𝑥 3 ) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

18 Covariance Covariance Correlation co-efficient
𝐶𝑜𝑣 𝑋,𝑌 =𝐸[(𝑋−𝐸[𝑋])(𝑌−𝐸[𝑌])] Correlation co-efficient 𝜌 𝑋,𝑌 =𝐶𝑜𝑣(𝑋,𝑌)/√𝑉𝑎𝑟(𝑋)√𝑉𝑎𝑟(𝑌) Covariance matrix for a random vector X 𝐶𝑜𝑣 𝑋 =𝐸[ 𝑋−𝐸 𝑋 𝐸−𝐸 𝑋 𝑇 ] Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

19 Central Limit Theorem N i.i.d. random variables 𝑋 𝑖 with mean 𝜇, variance 𝜎 2 𝑆 𝑁 = 𝑖 𝑋 𝑖 𝑍 𝑁 = 𝑆 𝑁 −𝑁𝜇 𝜎 𝑁 As N increases the distribution of 𝑍 𝑁 approaches the standard normal distribution Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

20 Notions from Information Theory
Entropy 𝐻 𝑋 =− 𝑘 𝑃 𝑋=𝑘 log 2 𝑃(𝑋=𝑘) KL divergence 𝐾𝐿 𝑝 𝑞 = 𝑥 𝑝 𝑘 log 𝑝 𝑘 𝑞 𝑘 Mutual Information 𝐼 𝑋,𝑌 =𝐾𝐿 𝑝 𝑋,𝑌 ,𝑝 𝑋 𝑝 𝑌 = 𝑥,𝑦 𝑝 𝑥,𝑦 log 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) Point-wise Mutual Information 𝑃𝑀𝐼 𝑥,𝑦 = log 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

21 Jensen’s Inequality For a convex function f() and a random variable X
𝑓 𝐸 𝑋 ≤𝐸 𝑓 𝑥 Equality holds if f(x) is linear Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya


Download ppt "Probability for Machine Learning"

Similar presentations


Ads by Google