Machine Learning 10-701/15-781, Spring 2008 Tutorial on Basic Probability Eric Xing Lecture 2, September 15, 2006 Reading: Chap. 1&2, CB & Chap 5,6, TM.

Slides:



Advertisements
Similar presentations
Random Variables ECE460 Spring, 2012.
Advertisements

Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Sampling Distributions (§ )
Continuous Random Variables. L. Wang, Department of Statistics University of South Carolina; Slide 2 Continuous Random Variable A continuous random variable.
Probability.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.
Probability Distributions Finite Random Variables.
Chapter 6 Continuous Random Variables and Probability Distributions
Probability and Statistics Review
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
CHAPTER 6 Statistical Analysis of Experimental Data
Probability Distributions Random Variables: Finite and Continuous A review MAT174, Spring 2004.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin.
Lecture II-2: Probability Review
Continuous Probability Distributions A continuous random variable can assume any value in an interval on the real line or in a collection of intervals.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Chapter 4 Continuous Random Variables and Probability Distributions
Recitation 1 Probability Review
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Introduction to cell biology, genomics, development, and probability.
Tch-prob1 Chap 3. Random Variables The outcome of a random experiment need not be a number. However, we are usually interested in some measurement or numeric.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
QBM117 Business Statistics Probability and Probability Distributions Continuous Probability Distributions 1.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
Risk Analysis & Modelling Lecture 2: Measuring Risk.
Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Probability (outcome k) = Relative Frequency of k
Review of Chapter
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
CONTINUOUS RANDOM VARIABLES
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
ELEC 303, Koushanfar, Fall’09 ELEC 303 – Random Signals Lecture 8 – Continuous Random Variables: PDF and CDFs Farinaz Koushanfar ECE Dept., Rice University.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Probability & Statistics Review I 1. Normal Distribution 2. Sampling Distribution 3. Inference - Confidence Interval.
Random Variables By: 1.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
4. Overview of Probability Network Performance and Quality of Service.
MECH 373 Instrumentation and Measurements
Probability and Information Theory
STAT 311 REVIEW (Quick & Dirty)
STATISTICS Random Variables and Distribution Functions
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
CONTINUOUS RANDOM VARIABLES
Probability Review for Financial Engineers
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Continuous Random Variables: Basics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Machine Learning /15-781, Spring 2008 Tutorial on Basic Probability Eric Xing Lecture 2, September 15, 2006 Reading: Chap. 1&2, CB & Chap 5,6, TM

What is this? Classical AI and ML research ignored this phenomena Another example you want to catch a flight at 10:00am from Pitt to SF, can I make it if I leave at 8am and take a 28X at CMU? partial observability (road state, other drivers' plans, etc.) noisy sensors (radio traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modeling and predicting traffic

Basic Probability Concepts A sample space S is the set of all possible outcomes of a conceptual or physical, repeatable experiment. ( S can be finite or infinite.) E.g., S may be the set of all possible outcomes of a dice roll: E.g., S may be the set of all possible nucleotides of a DNA site: E.g., S may be the set of all possible time-space positions of a aircraft on a radar screen: An event A is any subset of S : Seeing "1" or "6" in a dice roll; observing a "G" at a site; UA007 in space-time interval An event space E is the possible worlds the outcomes can happen All dice-rolls, reading a genome, monitoring the radar signal

Probability A probability P(A) is a function that maps an event A onto the interval [0, 1]. P(A) is also called the probability measure or probability mass of A. Worlds in which A is true Worlds in which A is false P(a) is the area of the oval Sample space of all possible worlds. Its area is 1

Kolmogorov Axioms All probabilities are between 0 and 1 0 ≤ P(A) ≤ 1 P( E ) = 1 P(Φ)=0 The probability of a disjunction is given by P(A ∨ B) = P(A) + P(B) − P(A ∧ B) A B A∧BA∧B ¬A ∧ ¬B A ∨ B ?

Why use probability? There have been attempts to develop different methodologies for uncertainty: Fuzzy logic Qualitative reasoning (Qualitative physics) … “Probability theory is nothing but common sense reduced to calculation” — Pierre Laplace, In 1931, de Finetti proved that it is irrational to have beliefs that violate these axioms, in the following sense: If you bet in accordance with your beliefs, but your beliefs violate the axioms, then you can be guaranteed to lose money to an opponent whose beliefs more accurately reflect the true state of the world. (Here, “betting” and “money” are proxies for “decision making” and “utilities”.) What if you refuse to bet? This is like refusing to allow time to pass: every action (including inaction) is a bet

Random Variable A random variable is a function that associates a unique numerical value (a token) with every outcome of an experiment. (The value of the r.v. will vary from trial to trial as the experiment is repeated) Discrete r.v.: The outcome of a dice-roll The outcome of reading a nt at site i: Binary event and indicator variable: Seeing an "A" at a site  X =1, o/w X =0. This describes the true or false outcome a random event. Can we describe richer outcomes in the same way? (i.e., X=1, 2, 3, 4, for being A, C, G, T) --- think about what would happen if we take expectation of X. Continuous r.v.: The outcome of recording the true location of an aircraft: The outcome of observing the measured location of an aircraft  S X()X()

Discrete Prob. Distribution A probability distribution P defined on a discrete sample space S is an assignment of a non-negative real number P( s ) to each sample s  S such that  s  S P( s )=1. (0  P( s )  1) intuitively, P( s ) corresponds to the frequency (or the likelihood) of getting a particular sample s in the experiments, if repeated multiple times. call  s = P( s ) the parameters in a discrete probability distribution A discrete probability distribution is sometimes called a probability model, in particular if several different distributions are under consideration write models as M 1, M 2, probabilities as P( X | M 1 ), P( X | M 2 ) e.g., M 1 may be the appropriate prob. dist. if X is from "fair dice", M 2 is for the "loaded dice". M is usually a two-tuple of {dist. family, dist. parameters}

Bernoulli distribution: Ber( p ) Multinomial distribution: Mult(1,  ) Multinomial (indicator) variable: Discrete Distributions

Multinomial distribution: Mult( n,  ) Count variable: Discrete Distributions

Continuous Prob. Distribution A continuous random variable X is defined on a continuous sample space: an interval on the real line, a region in a high dimensional space, etc. X usually corresponds to a real-valued measurements of some property, e.g., length, position, … It is meaningless to talk about the probability of the random variable assuming a particular value --- P( x ) = 0 Instead, we talk about the probability of the random variable assuming a value within a given interval, or half interval, or arbitrary Boolean combination of basic propositions.

Probability Density If the prob. of x falling into [x, x+dx] is given by p(x)dx for dx, then p(x) is called the probability density over x. If the probability P(x) is differentiable, then the probability density over x is the derivative of P(x). The probability of the random variable assuming a value within some given interval from x 1 to x 2 is equivalent to the area under the graph of the probability density function between x 1 and x 2. Probability mass: note that Cumulative distribution function (CDF): Probability density function (PDF): Car flow on Liberty Bridge (cooked up!)

The intuitive meaning of p(x) If p(x 1 ) = a and p(x 2 ) = b, then when a value X is sampled from the distribution with density p(x), you are a/b times as likely to find that X is “very close to” x 1 than that X is “very close to” x 2. That is:

Uniform Density Function Normal (Gaussian) Density Function The distribution is symmetric, and is often illustrated as a bell-shaped curve. Two parameters,  (mean) and  (standard deviation), determine the location and shape of the distribution. The highest point on the normal curve is at the mean, which is also the median and mode. Exponential Distribution Continuous Distributions

Gaussian (Normal) density in 1D If X ∼ N(µ, σ 2 ), the probability density function (pdf) of X is defined as We will often use the precision λ = 1/σ 2 instead of the variance σ 2. Here is how we plot the pdf in matlab xs=-3:0.01:3; plot(xs,normpdf(xs,mu,sigma)); Note that a density evaluated at a point can be larger than 1.

Gaussian CDF If Z ∼ N(0, 1), the cumulative density function is defined as This has no closed form expression, but is built in to most software packages (eg. normcdf in matlab stats toolbox).

More on Gaussian Distribution If X ∼ N(µ, σ 2 ), then Z = (X − µ)/σ ∼ N(0, 1). How much mass is contained inside the [-2σ,2σ] interval? Since p(Z ≤ −2) = normcdf(−2) = we have P(−2σ < X−µ < 2σ) ≈ 1 − 2 × = 0.95

Expectation: the centre of mass, mean value, first moment): Sample mean: Variance: the spreadness: Sample variance Statistical Characterizations

Central limit theorem If (X 1,X 2, … X n ) are i.i.d. continuous random variables Define As n  infinity,  Gaussian with mean E[X i ] and variance Var[X i ] Somewhat of a justification for assuming Gaussian noise is common