Probability for Machine Learning

Probability for Machine Learning
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic Machine Learning
Not all machine learning models are probabilistic … but most of them have probabilistic interpretations Predictions need to have associated confidence Confidence = probability Arguments for probabilistic approach Complete framework for Machine Learning Makes assumptions explicit Recovers most non-probabilistic models as special cases Modular: Easily extensible Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

References “Introduction to Probability Models”, Sheldon Ross
“Introduction to Probability and Statistics for Engineers and Scientists”, Sheldon Ross “Introduction To Probability”, Dimitri P. Bertsekas, John N. Tsitsiklis Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Basics Random experiment 𝐸, outcome 𝜔∈Ω, events 𝐹, sample space (Ω,𝐹)
Probability measure 𝑃:𝐹→𝑅 Axioms of probability, basic laws of probability Discrete sample space, discrete probability measure Continuous sample space, continuous probability measure Conditional probability, multiplicative rule, theorem of total probability, Bayes theorem Independence, pair-wise, mutual, conditional independence Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Variables 𝑋:Ω→𝑅 Example: Experiment: Tossing of two coins
Random variable: sum of two outcomes 𝑋=2 ≡ 𝜔:𝑠𝑢𝑚 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠=2 = 1,1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Discrete Random Variables
Probability mass function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Example distributions: Discrete
Bernoulli: 𝑥∼𝐵𝑒𝑟 𝑝 , 𝑥∈{0,1}≡𝑝 𝑥 = 𝑝 𝑥 1−𝑝 1−𝑥 Binomial: 𝑥∼𝐵𝑖𝑛 𝑛,𝑝 , 𝑥∈{0,…,𝑛}≡𝑝 𝑥 =𝑛𝐶𝑥 𝑝 𝑥 1−𝑝 1−𝑥 Poisson: 𝑥∼𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝜆 , 𝑥∈{0,1, …}≡𝑝 𝑥 = 𝑒 −𝜆 𝜆 𝑘 𝑘! Geometric: 𝑥∼𝐺𝑒𝑜 𝑝 , 𝑥∈{1,…,𝑛}≡𝑝 𝑥 = 1−𝑝 𝑥−1 𝑝 Empirical distribution: Given 𝐷= 𝑥 1 ,…, 𝑥 𝑛 , 𝑝 𝑒𝑚𝑝 𝐴 = 1 𝑁 𝑖 𝛿 𝑥 𝑖 (𝐴) , where 𝛿 𝑥 𝑖 (𝐴) is the Dirac delta measure Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Continuous Random Variables
Probability density function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Example density functions
Uniform: 𝑥∼𝑈 𝑎,𝑏 ≡𝑓 𝑥 = 1 𝑏−𝑎 Exponential: 𝑥∼𝐸𝑥𝑝 𝜆 ≡𝑓 𝑥 =𝜆 𝑒 −𝜆𝑥 Standard Normal: 𝑥∼𝑁 0,1 ≡𝑓 𝑥 = 1 √2𝜋 𝑒 − 𝑥 2 /2 Gaussian: 𝑥∼𝑁(𝜇,𝜎)≡𝑓 𝑥 = 1 √2𝜋𝜎 𝑒 −( 𝑥−𝜇) 2 /2 𝜎 2 Laplace: 𝑥∼𝐿𝑎𝑝(𝜇,𝑏)≡𝑓 𝑥 = 1 2𝑏 𝑒 −|𝑥−𝜇|/𝑏 Gamma: 𝑥∼𝐺𝑎𝑚(𝛼,𝛽)≡𝑓 𝑥 = 𝛽 𝛼 Γ(𝛼) 𝑥 𝛼−1 𝑒 −𝛽𝑥 Beta: 𝑥∼𝐵𝑒𝑡𝑎(𝛼,𝛽)≡𝑓 𝑥 = Γ 𝛼 Γ(𝛽) Γ 𝛼+𝛽 𝑥 𝛼−1 (1−𝑥) 𝛽−1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Variables Cumulative distribution function
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Moments Mean Variance 𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 𝑉𝑎𝑟 𝑋 =𝐸[ 𝑋−𝐸 𝑋 2 ]
𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 Variance 𝑉𝑎𝑟 𝑋 =𝐸[ 𝑋−𝐸 𝑋 2 ] Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Vectors and Joint Distributions
Discrete Random Vector Joint pmf Continuous Random Vector Joint pdf Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Example multi-variate distributions
Multi-variate Gaussian 𝑥∼𝑁 𝜇,Σ ≡𝑓 𝑥 = 2𝜋 − 𝑘 2 Σ −1 𝑥−𝜇 𝑇 Σ −1 (𝑥−𝜇) Multinomial 𝑥∼𝑀𝑢𝑙𝑡 𝑝 1 ,…, 𝑝 𝑘 ≡𝑓 𝑥 1 ,…, 𝑥 𝑘 = 𝑛! 𝑥 1 !… 𝑥 𝑘 ! 𝑝 1 𝑥 1 … 𝑝 𝑘 𝑥 𝑘 Dirichlet 𝑥∼𝐷𝑖𝑟 𝛼 1 ,…, 𝛼 𝑘 ≡𝑓 𝑥 1 ,…, 𝑥 𝑘 = Γ 𝑖 𝛼 𝑖 𝑖 Γ 𝛼 𝑖 𝑖 𝑥 𝑖 𝛼 𝑖 −1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Vectors and Joint Distributions
Given 𝑓( 𝑥 1 ,… 𝑥 𝑘 ), Marginal distributions 𝑓 𝑋 1 𝑥 1 = 𝑥 𝑥 3 … 𝑓 𝑥 1 ,…, 𝑥 𝑘 𝑑 𝑥 2 𝑑 𝑥 3 … Expectation 𝐸[𝑋]= 𝑥 𝑥 2 … ( 𝑥 1 ,…, 𝑥 𝑘 )𝑓 𝑥 1 ,…, 𝑥 𝑘 𝑑 𝑥 1 𝑑 𝑥 2 … Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Conditional Probability
Conditional pmf Conditional pdf Given 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 ), 𝑓 𝑋 1 | 𝑋 2 𝑥 1 𝑥 2 = 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 )/ 𝑓 𝑋 2 ( 𝑥 2 ) Multiplication Rule Bayes rule 𝑓 𝑋 1 | 𝑋 2 𝑥 1 𝑥 2 = 𝑓 𝑋 2 | 𝑋 1 𝑥 2 𝑥 1 𝑓 𝑋 1 𝑥 𝑥 1 𝑓 𝑋 2 | 𝑋 1 𝑥 2 𝑥 1 𝑓 𝑋 1 𝑥 1 𝑑 𝑥 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Conditional Probability
Given 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 ), Conditional Expectation 𝐸 𝑋 1 𝑥 2 = 𝑥 1 𝑓 𝑋_1| 𝑋 2 𝑥 1 𝑥 2 𝑑 𝑥 1 Law of Total Expectation 𝐸 𝑋 1 = 𝐸 𝑋 1 𝑥 2 𝑓 𝑋 2 𝑥 2 𝑑 𝑥 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Independence and Conditional Independence
𝑓 𝑋 1 𝑋 2 𝑥 1 , 𝑥 2 = 𝑓 𝑋 1 ( 𝑥 1 ) 𝑓 𝑋 2 ( 𝑥 2 ) Conditional Independence 𝑓 𝑋 1 𝑋 2 | 𝑋 3 𝑥 1 , 𝑥 2 | 𝑥 3 = 𝑓 𝑋 1 | 𝑋 3 ( 𝑥 1 | 𝑥 3 ) 𝑓 𝑋 2 | 𝑋 3 ( 𝑥 2 | 𝑥 3 ) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Covariance Covariance Correlation co-efficient
𝐶𝑜𝑣 𝑋,𝑌 =𝐸[(𝑋−𝐸[𝑋])(𝑌−𝐸[𝑌])] Correlation co-efficient 𝜌 𝑋,𝑌 =𝐶𝑜𝑣(𝑋,𝑌)/√𝑉𝑎𝑟(𝑋)√𝑉𝑎𝑟(𝑌) Covariance matrix for a random vector X 𝐶𝑜𝑣 𝑋 =𝐸[ 𝑋−𝐸 𝑋 𝐸−𝐸 𝑋 𝑇 ] Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Central Limit Theorem N i.i.d. random variables 𝑋 𝑖 with mean 𝜇, variance 𝜎 2 𝑆 𝑁 = 𝑖 𝑋 𝑖 𝑍 𝑁 = 𝑆 𝑁 −𝑁𝜇 𝜎 𝑁 As N increases the distribution of 𝑍 𝑁 approaches the standard normal distribution Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Notions from Information Theory
Entropy 𝐻 𝑋 =− 𝑘 𝑃 𝑋=𝑘 log 2 𝑃(𝑋=𝑘) KL divergence 𝐾𝐿 𝑝 𝑞 = 𝑥 𝑝 𝑘 log 𝑝 𝑘 𝑞 𝑘 Mutual Information 𝐼 𝑋,𝑌 =𝐾𝐿 𝑝 𝑋,𝑌 ,𝑝 𝑋 𝑝 𝑌 = 𝑥,𝑦 𝑝 𝑥,𝑦 log 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) Point-wise Mutual Information 𝑃𝑀𝐼 𝑥,𝑦 = log 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Jensen’s Inequality For a convex function f() and a random variable X
𝑓 𝐸 𝑋 ≤𝐸 𝑓 𝑥 Equality holds if f(x) is linear Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probability for Machine Learning

Similar presentations

Presentation on theme: "Probability for Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probability for Machine Learning

Similar presentations

Presentation on theme: "Probability for Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback