Probability for Machine Learning

Slides:



Advertisements
Similar presentations
Random Variables ECE460 Spring, 2012.
Advertisements

Statistics review of basic probability and statistics.
Review of Basic Probability and Statistics
Chapter 1 Probability Theory (i) : One Random Variable
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
1 Engineering Computation Part 6. 2 Probability density function.
1 Review of Probability Theory [Source: Stanford University]
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
Probability and Statistics Review
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
2. Random variables  Introduction  Distribution of a random variable  Distribution function properties  Discrete random variables  Point mass  Discrete.
The moment generating function of random variable X is given by Moment generating function.
Ya Bao Fundamentals of Communications theory1 Random signals and Processes ref: F. G. Stremler, Introduction to Communication Systems 3/e Probability All.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Distribution Function properties. Density Function – We define the derivative of the distribution function F X (x) as the probability density function.
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Hamid R. Rabiee Fall 2009 Stochastic Processes Review of Elementary Probability Lecture I.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
2. Mathematical Foundations
STAT 552 PROBABILITY AND STATISTICS II
OUTLINE Probability Theory Linear Algebra Probability makes extensive use of set operations, A set is a collection of objects, which are the elements.
General information CSE : Probabilistic Analysis of Computer Systems
CSE 531: Performance Analysis of Systems Lecture 2: Probs & Stats review Anshul Gandhi 1307, CS building
IRDM WS Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events, Probabilities, Random Variables, Distributions,
Winter 2006EE384x1 Review of Probability Theory Review Session 1 EE384X.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Ch2: Probability Theory Some basics Definition of Probability Characteristics of Probability Distributions Descriptive statistics.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
EE 5345 Multiple Random Variables
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
Math 4030 – 6a Joint Distributions (Discrete)
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
2003/02/19 Chapter 2 1頁1頁 Chapter 2 : Basic Probability Theory Set Theory Axioms of Probability Conditional Probability Sequential Random Experiments Outlines.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Random Variables By: 1.
Topic Overview and Study Checklist. From Chapter 7 in the white textbook: Modeling with Differential Equations basic models exponential logistic modified.
Background for Machine Learning (I) Usman Roshan.
Basic statistics Usman Roshan.
Probability Theory and Parameter Estimation I
Basic statistics Usman Roshan.
Probability Theory Overview and Analysis of Randomized Algorithms
Appendix A: Probability Theory
CS 2750: Machine Learning Probability Review Density Estimation
The distribution function F(x)
Of Probability & Information Theory
Review of Probabilities and Basic Statistics
Probabilistic Models for Linear Regression
CE607: Random Vibration Dr. A. Chakraborty By
Distributions and Concepts in Probability Theory
Probabilistic Models with Latent Variables
Probability Review for Financial Engineers
Advanced Artificial Intelligence
Statistical NLP: Lecture 4
ASV Chapters 1 - Sample Spaces and Probabilities
Chapter 3 : Random Variables
Further Topics on Random Variables: 1
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Berlin Chen Department of Computer Science & Information Engineering
Experiments, Outcomes, Events and Random Variables: A Revisit
Discrete Random Variables: Expectation, Mean and Variance
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Discrete Random Variables: Basics
Presentation transcript:

Probability for Machine Learning Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic Machine Learning Not all machine learning models are probabilistic … but most of them have probabilistic interpretations Predictions need to have associated confidence Confidence = probability Arguments for probabilistic approach Complete framework for Machine Learning Makes assumptions explicit Recovers most non-probabilistic models as special cases Modular: Easily extensible Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

References “Introduction to Probability Models”, Sheldon Ross “Introduction to Probability and Statistics for Engineers and Scientists”, Sheldon Ross “Introduction To Probability”, Dimitri P. Bertsekas, John N. Tsitsiklis Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Basics Random experiment 𝐸, outcome 𝜔∈Ω, events 𝐹, sample space (Ω,𝐹) Probability measure 𝑃:𝐹→𝑅 Axioms of probability, basic laws of probability Discrete sample space, discrete probability measure Continuous sample space, continuous probability measure Conditional probability, multiplicative rule, theorem of total probability, Bayes theorem Independence, pair-wise, mutual, conditional independence Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Variables 𝑋:Ω→𝑅 Example: Experiment: Tossing of two coins Random variable: sum of two outcomes 𝑋=2 ≡ 𝜔:𝑠𝑢𝑚 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠=2 = 1,1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Discrete Random Variables Probability mass function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Example distributions: Discrete Bernoulli: 𝑥∼𝐵𝑒𝑟 𝑝 , 𝑥∈{0,1}≡𝑝 𝑥 = 𝑝 𝑥 1−𝑝 1−𝑥 Binomial: 𝑥∼𝐵𝑖𝑛 𝑛,𝑝 , 𝑥∈{0,…,𝑛}≡𝑝 𝑥 =𝑛𝐶𝑥 𝑝 𝑥 1−𝑝 1−𝑥 Poisson: 𝑥∼𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝜆 , 𝑥∈{0,1, …}≡𝑝 𝑥 = 𝑒 −𝜆 𝜆 𝑘 𝑘! Geometric: 𝑥∼𝐺𝑒𝑜 𝑝 , 𝑥∈{1,…,𝑛}≡𝑝 𝑥 = 1−𝑝 𝑥−1 𝑝 Empirical distribution: Given 𝐷= 𝑥 1 ,…, 𝑥 𝑛 , 𝑝 𝑒𝑚𝑝 𝐴 = 1 𝑁 𝑖 𝛿 𝑥 𝑖 (𝐴) , where 𝛿 𝑥 𝑖 (𝐴) is the Dirac delta measure Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Continuous Random Variables Probability density function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Example density functions Uniform: 𝑥∼𝑈 𝑎,𝑏 ≡𝑓 𝑥 = 1 𝑏−𝑎 Exponential: 𝑥∼𝐸𝑥𝑝 𝜆 ≡𝑓 𝑥 =𝜆 𝑒 −𝜆𝑥 Standard Normal: 𝑥∼𝑁 0,1 ≡𝑓 𝑥 = 1 √2𝜋 𝑒 − 𝑥 2 /2 Gaussian: 𝑥∼𝑁(𝜇,𝜎)≡𝑓 𝑥 = 1 √2𝜋𝜎 𝑒 −( 𝑥−𝜇) 2 /2 𝜎 2 Laplace: 𝑥∼𝐿𝑎𝑝(𝜇,𝑏)≡𝑓 𝑥 = 1 2𝑏 𝑒 −|𝑥−𝜇|/𝑏 Gamma: 𝑥∼𝐺𝑎𝑚(𝛼,𝛽)≡𝑓 𝑥 = 𝛽 𝛼 Γ(𝛼) 𝑥 𝛼−1 𝑒 −𝛽𝑥 Beta: 𝑥∼𝐵𝑒𝑡𝑎(𝛼,𝛽)≡𝑓 𝑥 = Γ 𝛼 Γ(𝛽) Γ 𝛼+𝛽 𝑥 𝛼−1 (1−𝑥) 𝛽−1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Variables Cumulative distribution function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Moments Mean Variance 𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 𝑉𝑎𝑟 𝑋 =𝐸[ 𝑋−𝐸 𝑋 2 ] 𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 Variance 𝑉𝑎𝑟 𝑋 =𝐸[ 𝑋−𝐸 𝑋 2 ] Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Vectors and Joint Distributions Discrete Random Vector Joint pmf Continuous Random Vector Joint pdf Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Example multi-variate distributions Multi-variate Gaussian 𝑥∼𝑁 𝜇,Σ ≡𝑓 𝑥 = 2𝜋 − 𝑘 2 Σ −1 𝑥−𝜇 𝑇 Σ −1 (𝑥−𝜇) Multinomial 𝑥∼𝑀𝑢𝑙𝑡 𝑝 1 ,…, 𝑝 𝑘 ≡𝑓 𝑥 1 ,…, 𝑥 𝑘 = 𝑛! 𝑥 1 !… 𝑥 𝑘 ! 𝑝 1 𝑥 1 … 𝑝 𝑘 𝑥 𝑘 Dirichlet 𝑥∼𝐷𝑖𝑟 𝛼 1 ,…, 𝛼 𝑘 ≡𝑓 𝑥 1 ,…, 𝑥 𝑘 = Γ 𝑖 𝛼 𝑖 𝑖 Γ 𝛼 𝑖 𝑖 𝑥 𝑖 𝛼 𝑖 −1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Vectors and Joint Distributions Given 𝑓( 𝑥 1 ,… 𝑥 𝑘 ), Marginal distributions 𝑓 𝑋 1 𝑥 1 = 𝑥 2 𝑥 3 … 𝑓 𝑥 1 ,…, 𝑥 𝑘 𝑑 𝑥 2 𝑑 𝑥 3 … Expectation 𝐸[𝑋]= 𝑥 1 𝑥 2 … ( 𝑥 1 ,…, 𝑥 𝑘 )𝑓 𝑥 1 ,…, 𝑥 𝑘 𝑑 𝑥 1 𝑑 𝑥 2 … Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Conditional Probability Conditional pmf Conditional pdf Given 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 ), 𝑓 𝑋 1 | 𝑋 2 𝑥 1 𝑥 2 = 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 )/ 𝑓 𝑋 2 ( 𝑥 2 ) Multiplication Rule Bayes rule 𝑓 𝑋 1 | 𝑋 2 𝑥 1 𝑥 2 = 𝑓 𝑋 2 | 𝑋 1 𝑥 2 𝑥 1 𝑓 𝑋 1 𝑥 1 𝑥 1 𝑓 𝑋 2 | 𝑋 1 𝑥 2 𝑥 1 𝑓 𝑋 1 𝑥 1 𝑑 𝑥 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Conditional Probability Given 𝑓 𝑋 1 𝑋 2 ( 𝑥 1 , 𝑥 2 ), Conditional Expectation 𝐸 𝑋 1 𝑥 2 = 𝑥 1 𝑓 𝑋_1| 𝑋 2 𝑥 1 𝑥 2 𝑑 𝑥 1 Law of Total Expectation 𝐸 𝑋 1 = 𝐸 𝑋 1 𝑥 2 𝑓 𝑋 2 𝑥 2 𝑑 𝑥 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Independence and Conditional Independence 𝑓 𝑋 1 𝑋 2 𝑥 1 , 𝑥 2 = 𝑓 𝑋 1 ( 𝑥 1 ) 𝑓 𝑋 2 ( 𝑥 2 ) Conditional Independence 𝑓 𝑋 1 𝑋 2 | 𝑋 3 𝑥 1 , 𝑥 2 | 𝑥 3 = 𝑓 𝑋 1 | 𝑋 3 ( 𝑥 1 | 𝑥 3 ) 𝑓 𝑋 2 | 𝑋 3 ( 𝑥 2 | 𝑥 3 ) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Covariance Covariance Correlation co-efficient 𝐶𝑜𝑣 𝑋,𝑌 =𝐸[(𝑋−𝐸[𝑋])(𝑌−𝐸[𝑌])] Correlation co-efficient 𝜌 𝑋,𝑌 =𝐶𝑜𝑣(𝑋,𝑌)/√𝑉𝑎𝑟(𝑋)√𝑉𝑎𝑟(𝑌) Covariance matrix for a random vector X 𝐶𝑜𝑣 𝑋 =𝐸[ 𝑋−𝐸 𝑋 𝐸−𝐸 𝑋 𝑇 ] Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Central Limit Theorem N i.i.d. random variables 𝑋 𝑖 with mean 𝜇, variance 𝜎 2 𝑆 𝑁 = 𝑖 𝑋 𝑖 𝑍 𝑁 = 𝑆 𝑁 −𝑁𝜇 𝜎 𝑁 As N increases the distribution of 𝑍 𝑁 approaches the standard normal distribution Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Notions from Information Theory Entropy 𝐻 𝑋 =− 𝑘 𝑃 𝑋=𝑘 log 2 𝑃(𝑋=𝑘) KL divergence 𝐾𝐿 𝑝 𝑞 = 𝑥 𝑝 𝑘 log 𝑝 𝑘 𝑞 𝑘 Mutual Information 𝐼 𝑋,𝑌 =𝐾𝐿 𝑝 𝑋,𝑌 ,𝑝 𝑋 𝑝 𝑌 = 𝑥,𝑦 𝑝 𝑥,𝑦 log 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) Point-wise Mutual Information 𝑃𝑀𝐼 𝑥,𝑦 = log 𝑝(𝑥,𝑦) 𝑝 𝑥 𝑝(𝑦) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Jensen’s Inequality For a convex function f() and a random variable X 𝑓 𝐸 𝑋 ≤𝐸 𝑓 𝑥 Equality holds if f(x) is linear Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya