Of Probability & Information Theory

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Random Variables ECE460 Spring, 2012.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Visual Recognition Tutorial
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Review of Basic Probability and Statistics
QUANTITATIVE DATA ANALYSIS
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
CHAPTER 6 Statistical Analysis of Experimental Data
Continuous Random Variables and Probability Distributions
Lecture II-2: Probability Review
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
Modern Navigation Thomas Herring
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Distribution Function properties. Density Function – We define the derivative of the distribution function F X (x) as the probability density function.
CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics, 2007 Instructor Longin Jan Latecki Chapter 7: Expectation and variance.
Review of Probability.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Machine Learning Queens College Lecture 3: Probability and Statistics.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
PBG 650 Advanced Plant Breeding
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 2nd Probability and its fundamental.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Probability Refresher COMP5416 Advanced Network Technologies.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
Math 4030 – 6a Joint Distributions (Discrete)
Lecture 2: Statistical learning primer for biologists
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 2nd Probability and its fundamental properties Kazuyuki Tanaka Graduate School of Information.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Pattern Recognition Probability Review
MECH 373 Instrumentation and Measurements
Probability and Information Theory
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Graduate School of Information Sciences, Tohoku University
Probability for Machine Learning
Appendix A: Probability Theory
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
STATISTICS Random Variables and Distribution Functions
CS 2750: Machine Learning Probability Review Density Estimation
Graduate School of Information Sciences, Tohoku University
Latent Variables, Mixture Models and EM
Review of Probability and Estimators Arun Das, Jason Rebello
Distributions and Concepts in Probability Theory
Probability Review 11/22/2018.
Hidden Markov Models Part 2: Algorithms
Bayesian Models in Machine Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
ASV Chapters 1 - Sample Spaces and Probabilities
Chapter 2. Random Variables
HKN ECE 313 Exam 2 Review Session
Part II: Discrete Random Variables
Uniform Probability Distribution
Presentation transcript:

Of Probability & Information Theory Alexander G. Ororbia II The Pennsylvania State University IST 597: Foundations of Deep Learning

Probability Mass Function Example: uniform distribution:

Probability Density Function Example: uniform distribution:

Computing Marginal Probability with the Sum Rule Summation  Discrete random variables! Integration  Continuous random variables!

Conditional Probability In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred

Chain Rule of Probability In probability theory, the chain rule (also called the general product rule) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities. The rule is useful in the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

Independence In probability theory, two events are independent, statistically independent, or stochastically independent if the occurrence of one does not affect the probability of occurrence of the other. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

Conditional Independence In probability theory, two events R and B are conditionally independent given a third event Y precisely if the occurrence of R and the occurrence of B are independent events in their conditional probability distribution given Y. In other words, R and B are conditionally independent given Y if and only if, given knowledge that Y occurs, knowledge of whether R occurs provides no information on the likelihood of B occurring, and knowledge of whether B occurs provides no information on the likelihood of R occurring.

linearity of expectations:

Variance and Covariance Covariance matrix: In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, i.e., the variables tend to show similar behavior, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables.

Bernoulli Distribution Can prove/derive each of these properties!

Gaussian Distribution Parametrized by variance: Parametrized by precision:

Gaussian Distribution Figure 3.1

Multivariate Gaussian Parametrized by covariance matrix: Parametrized by precision matrix:

More Distributions Exponential: Laplace: Dirac: The density of an idealized point mass or point charge, as a function that is equal to zero everywhere except for zero and whose integral over the entire real line is equal to one

Laplace Distribution

Empirical Distribution An empirical distribution function is the distribution function associated with the empirical measure of a sample.

Mixture Distributions In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. Gaussian mixture with three components Figure 3.2

Commonly used to parametrize Bernoulli distributions Logistic Sigmoid Commonly used to parametrize Bernoulli distributions

Softplus Function

Fundamental to statistical learning!! Memorize this rule! Bayes’ Rule Fundamental to statistical learning!! Memorize this rule!

In English Please? What does Bayes’ Formula helps to find? Helps us to find: By having already known:

Why do we care? Deep generative models!

Sparse Coding

(Probabilistic) Sparse Coding

Preview: Information Theory Entropy: KL divergence:

The KL Divergence is Asymmetric Mean-seeking! Mode-Seeking! Figure 3.6

Directed Model Figure 3.7

Undirected Model Figure 3.8

References This is a variation presentation of Ian Goodfellow’s slides, for Chapter 3 of Deep Learning (http://www.deeplearningbook.org/lecture_slides.ht ml)