Visual Recognition Tutorial

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 4 Probability and Probability Distributions
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
The General Linear Model. The Simple Linear Model Linear Regression.
Lec 18 Nov 12 Probability – definitions and simulation.
Visual Recognition Tutorial
1 Discrete Structures & Algorithms Discrete Probability.
Review of Basic Probability and Statistics
Maximum likelihood (ML) and likelihood ratio (LR) test
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Introduction to Probability and Statistics
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Maximum likelihood (ML) and likelihood ratio (LR) test
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
C4: DISCRETE RANDOM VARIABLES CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics Longin Jan Latecki.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Linear and generalised linear models
Great Theoretical Ideas in Computer Science.
Maximum likelihood (ML)
Probability and Statistics Review Thursday Sep 11.
Probability, Bayes’ Theorem and the Monty Hall Problem
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
Machine Learning Queens College Lecture 3: Probability and Statistics.
PBG 650 Advanced Plant Breeding
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Modeling and Simulation CS 313
11-1 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Probability and Statistics Chapter 11.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
DISCRETE PROBABILITY DISTRIBUTIONS
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Making sense of randomness
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Generalized Linear Models (GLMs) and Their Applications.
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning 5. Parametric Methods.
MATH 256 Probability and Random Processes Yrd. Doç. Dr. Didem Kivanc Tureli 14/10/2011Lecture 3 OKAN UNIVERSITY.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Conditional Expectation
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Estimation and Confidence Intervals
Review of Probability and Estimators Arun Das, Jason Rebello
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

236607 Visual Recognition Tutorial Tutorial 2 – the outline Example-1 from linear algebra Conditional probability Example 2: Bernoulli Distribution Bayes' Rule Example 3 Example 4 The game of three doors 236607 Visual Recognition Tutorial

Example – 1: Linear Algebra A line can be written as ax+by=1. You are given a number of example points: Let (A) Write a single matrix equation that expresses the constraint that each of these points lies on a single line (B) Is it always the case that some M exists? (C) Write an expression for M assuming it does exist. 236607 Visual Recognition Tutorial

Example – 1: Linear Algebra (A) For all the points to lie on the line, a and b must satisfy the following set of simultaneous equations: ax1+by1=1 : axN+byN=1 This can be written much more compactly in the matrix form (linear regression equation): PM=1 where 1 is an Nx1 column vector of 1’s. (B) An M satisfying the matrix equation in part (A) will not exist unless al the points are collinear (i.e. fall on the same line). In general, three or more points may not be collinear. (C) If M exists, then we can find it by finding the left inverse of P, but since P is in general not a square matrix P -1 may not exist, so we need the pseudo-inverse (P TP) -1 P T. Thus M= (P TP) -1 P T 1. 236607 Visual Recognition Tutorial

Conditional probability When 2 variables are statistically dependent, knowing the value of one of them lets us get a better estimate of the value of the other one. This is expressed by the conditional probability of x given y: If x and y are statistically independent, then 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Bayes' Rule The law of total probability: If event A can occur in m different ways A1,A2,…,Am and if they are mutually exclusive,then the probability of A occurring is the sum of the probabilities A1,A2,…,Am . From definition of condition probability 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Bayes' Rule or 236607 Visual Recognition Tutorial

Bayes' rule – continuous case For continuous random variable we refer to densities rather than probabilities; in particular, The Bayes’ rule for densities becomes: 236607 Visual Recognition Tutorial

Bayes' formula - importance Call x a ‘cause’, y an effect. Assuming x is present, we know the likelihood of y to be observed The Bayes’ rule allows to determine the likelihood of a cause x given an observation y. (Note that there may be many causes producing y ). The Bayes’ rule shows how probability for x changes from prior p(x) before we observe anything, to posterior p(x| y) once we have observed y. 236607 Visual Recognition Tutorial

Example – 2: Bernoulli Distribution A random variable X has a Bernoulli distribution with parameter q if it can assume a value of 1 with a probability of q and the value of 0 with a probability of (1-q ). The random variable X is also known as a Bernoulli variable with parameter q and has the following probability mass function: The mean of a random variable X that has a Bernoulli distribution with parameter p is E(X) = 1(q) + 0(1- q) = q The variance of X is 236607 Visual Recognition Tutorial

Example – 2: Bernoulli Distribution A random variable whose value represents the outcome of a coin toss (1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable with parameter q, where q is the probability that the outcome corresponding to the value 1 occurs. For an unbiased coin, where heads or tails are equally likely to occur, q = 0.5. For Bernoulli rand. variable xn the probability mass function is: For N independent Bernoulli trials we have random sample 236607 Visual Recognition Tutorial

Example – 2: Bernoulli Distribution The distribution of the random sample is: The distribution of the number of ones in N independent Bernoulli trials is: The joint probability to observe the sample x and the number k 236607 Visual Recognition Tutorial

Example – 2: Bernoulli Distribution The conditional probability of x given the number k of ones: Thus 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Example - 3 Assume that X is distributed according to the Gaussian density with mean m=0 and variance s 2=1. (A) What is the probability that x =0 ? Assume that Y is distributed according to the Gaussian density with mean m=1 and variance s2=1. (B) What is the probability that y =0 ? Given a distribution: Pr(Z=z)=1/2Pr(X=z)+1/2Pr(Y=z) known as a mixture (i.e. ½ of the time points are generated by the X process and ½ of the time points by the Y process ). (C) If Z =0, what is the probability that the X process generated this data point ? 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Example – 3 solutions (A) Since p(x) is a continuous density, the probability that x=0 is (B) As in part (A), the probability that y=0 is (C) Let w0 (w1) be the state where the X (Y) process generates a data point. We want Pr(w0 |Z=0). Using Bayes’ rule and working with the probability densities to get the total probability: 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Example - 3 where 236607 Visual Recognition Tutorial

Example 4 The game of three doors A game: 3 doors, there is a prize behind one of them. You have to select one door. Then one of the other two doors is opened (not revealing the prize). At this point you may either stick with your door, or switch to the other (still closed) door. Should one stick with his initial choice, or switch, and does your choice make any difference at all? 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial The game of three doors Let Hi denote the hypothesis “the prize is behind the door i ”. Assumption: Suppose w.l.o.g.: initial choice of door 1,then door 3 is opened. We can stick with 1 or switch to 2. Let D denote the door which is opened by the host. We assume: By Bayes’ formula: 236607 Visual Recognition Tutorial

The game of three doors-the solutiion The denominator is a normalizing factor. So we get which means that we are more likely to win the prize if we switch to the door 2. 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Further complication A violent earthquake occurs just before the host has opened one of the doors; door 3 is opened accidentally, and there is no prize behind it. The host says “it is valid door, let’s let it stay open and go on with the game”. What should we do now? First, any number of doors might have been opened by the earthquake. There are 8 possible outcomes, which we assume to be equiprobable: d=(0,0,0),…,d=(1,1,1). A value of D now consists of the outcome of the quake, and the visibility of the prize; e.g., <(0,0,1),NO> We have to compare Pr(H1|D) vs. Pr(H2|D) . 236607 Visual Recognition Tutorial

The earthquake-continued Pr(D|Hi) hard to estimate, but we know that Pr(D|H3)=0 . Also from Pr(H3 , D)= Pr(H3 | D)Pr(D)= Pr(D|H3) Pr(H3) and from we have Pr(H3 | D)=0. Further, we have to assume that Pr(D|H1)= Pr(D|H2) (we don’t know the values, but we assume they are equal). Now we have 236607 Visual Recognition Tutorial

The earthquake-continued (why they are ½?) (Because and we get Also ) So, we might just as well stick with our choice. We have different outcome because the data here is of different nature (although looks the same). 236607 Visual Recognition Tutorial