Short review of probabilistic concepts

Slides:



Advertisements
Similar presentations
Chapter 2 Concepts of Prob. Theory
Advertisements

CS433: Modeling and Simulation
Chapter 4 Probability and Probability Distributions
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Review of Basic Probability and Statistics
Introduction to stochastic process
Maximum likelihood (ML) and likelihood ratio (LR) test
Chapter 4 Probability.
Some Basic Concepts Schaum's Outline of Elements of Statistics I: Descriptive Statistics & Probability Chuck Tappert and Allen Stix School of Computer.
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
Class notes for ISE 201 San Jose State University
Maximum likelihood (ML) and likelihood ratio (LR) test
Sections 4.1, 4.2, 4.3 Important Definitions in the Text:
Some standard univariate probability distributions
Maximum likelihood (ML)
Chapter6 Jointly Distributed Random Variables
Review of Probability.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Conditional Probability and Independence If A and B are events in sample space S and P(B) > 0, then the conditional probability of A given B is denoted.
Lecture 14: Multivariate Distributions Probability Theory and Applications Fall 2005 October 25.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Independence and Bernoulli.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
Basic Concepts of Discrete Probability (Theory of Sets: Continuation) 1.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
LECTURE IV Random Variables and Probability Distributions I.
0 K. Salah 2. Review of Probability and Statistics Refs: Law & Kelton, Chapter 4.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
1 TABLE OF CONTENTS PROBABILITY THEORY Lecture – 1Basics Lecture – 2 Independence and Bernoulli Trials Lecture – 3Random Variables Lecture – 4 Binomial.
LECTURE 14 TUESDAY, 13 OCTOBER STA 291 Fall
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Independence and Bernoulli Trials. Sharif University of Technology 2 Independence  A, B independent implies: are also independent. Proof for independence.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
2. Introduction to Probability. What is a Probability?
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Probability (outcome k) = Relative Frequency of k
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
President UniversityErwin SitompulPBST 3/1 Dr.-Ing. Erwin Sitompul President University Lecture 3 Probability and Statistics
Lecture 6 Dustin Lueker.  Standardized measure of variation ◦ Idea  A standard deviation of 10 may indicate great variability or small variability,
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
R.Kass/F02 P416 Lecture 1 1 Lecture 1 Probability and Statistics Introduction: l The understanding of many physical phenomena depend on statistical and.
Week 21 Rules of Probability for all Corollary: The probability of the union of any two events A and B is Proof: … If then, Proof:
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Basic Probability. Introduction Our formal study of probability will base on Set theory Axiomatic approach (base for all our further studies of probability)
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Short review of probabilistic concepts
CHAPTER 2 RANDOM VARIABLES.
What is Probability? Quantification of uncertainty.
Appendix A: Probability Theory
Probability The term probability refers to indicate the likelihood that some event will happen. For example, ‘there is high probability that it will rain.
3.1 Expectation Expectation Example
Review of Probability and Estimators Arun Das, Jason Rebello
Advanced Artificial Intelligence
Experiments, Outcomes, Events and Random Variables: A Revisit
Presentation transcript:

Short review of probabilistic concepts Probability theory plays a central role in statistics. This lecture gives a short review of the basic concepts of the probability theory. Contents of this lecture Basic principles and definitions Conditional probabilities and independence Bayes’s theorem and postulate Random variables and probability distributions Expectations and moments

Random experiment Random experiment satisfies following conditions: All possible distinct outcomes are known in advance In any particular experiment outcome is not known in advance Experiment can be repeated under identical conditions The outcome space -  is the set of all possible outcomes. Example 1. Tossing a coin is a random experiment. The outcome space is {H,T} – head and tail. Example 2. Rolling a die. The outcome space is a set - {1,2,3,4,5,6} Example 3. Drawing from an urn with N balls, M of them is red and N-M is white. The outcome space is {R,W} – red and white Example 5. Measuring temperature (in C or in K): What is the outcome space? Something that might or might not happen depending on the outcome of the experiment is called an event. An event is a subset of the outcome space Example: Rolling a die. {1,2,3} or {2,4,6} Example: Measuring temperature in Celsius. Give an example of an event.

Classical definition of probability If all the outcomes are equally likely then the probability of an event A is the number of outcomes in A (M(A)) divided by the number of all outcomes (M): Example: If a coin is fair then the probability of H is ½ and probability of T is ½ Example: If a die is fair then the probability of {1} is 1/6 If the outcome space is real numbers or are in a space then probability is measured as ratio of the area of an event to that of outcome space: Where M is the area. Example: Outcome space is the interval [0,2]. What is the probability of [0,1]?

Frequency definition of probability Since random experiments can be repeated as many times as we wish under identical conditions (in theory) we can measure the relative frequency of occurrences of an event. If the number of trials is m and the number of the occurrences of A is m(A) then according to the frequency definition the probability of A is the limit: According to the law of large numbers this limit exists. When the number of trials is small then there might be strong fluctuations. As the number of trials increases fluctuations tend to decrease.

Other (subjective) definitions of probability There are other definitions of probability also: Degree of belief. How much a person believes in occurrence of an event. In that sense one person’s probability would be different from another person’s. Degree of knowledge. In many cases exact value of a parameter exists but we do not know it. By carrying out experiments we want to find this value. Since experiment is prone to errors it is in general impossible to find the exact value and we assign probability for this. That is the purpose of the most statistical procedures and techniques. According to Jaynes if proper rules are designed then exactly same information would produce exactly same probabilities. (See Jaynes, The Probability theory: Logic of Science). This definition reflects our state of knowledge about parameters and can change as we update our knowledge.

Probability axioms Probability is defined as a function from subsets of outcome space  to the real line - R that satisfies the following conditions: Non-negativity: P(A)  0 Additivity: if AB= then P(AB) = P(A) + P(B) Probability of the whole space is 1. P() = 1 All above definitions obey these rules. So any property that can be derived from these axioms is valid for all definitions Small exercise: Show that: P( )=0 (Hint:   = ) Show that: 0  P(A)  1 (Hint A and Ã=-A are not intersecting).

Example a) Let us assume that outcome space is a square with sides equal 1 units. Probability of the event A is the area of A. The the probability of either A or B is the sum of areas of A and B. Probability of A and B is zero. Same as in a). Probability of A is the area of A, probability of B is the area of B. Probability of either A or B is not the sum of he areas of A and B. P(AB)=P(A)+P(B)-P(AB) A B b) A AB B

Conditional probability and independence Let us consider a case: an event B has occurred or will occur and we want to know what is the probability of A. Knowing B may influence our knowledge about A. Or occurrence of B may influence of the occurrence of A. The probability of A given B is called conditional probability of A given B and is defined as (for P(B)>0): It is clear that the event B has become new outcome space. Event A and B are called independent if occurrence of B does not influence on probability of A. It can also be written as: Note that only one of the above equations is independent.

Example Conditional probability of A given B is the area of AB divided by the area of B. It makes sense since we take it as a fact that B certainly has happened. So probability of A given B will be defined by the set B only. In some sense we normalise the area of AB by the area of B A AB B

The Law of total probability In many cases when direct calculation of probability is not known it is easier to divide an event into smaller parts and calculate their probability and then take weighted average of them. This can be done using the law of total probability. Let B1, B2,,,Bn be partition of , I.e. they are mutually exclusive (BiBj=) and their sum is  (1n Bi= ) then from the axioms of probability: (Here we do inverse what we did before: remove normalisation of A by the set Bi and then sum over all of them. (P(A|Bi)P(Bi) is probability of A with respect to the original outcome space). This law is a useful tool to calculate probabilities. Consider a box with N balls, M of them are red and N-M are white. We make two draws. We don’t know what is the first ball. What is probability of the second ball being red. (Hint: Use partition as ({R1} {W1}). Then use law of total probability for ({R2}. Here subscript shows the first or the second draw.)

Bayes’s theorem Bayes’s theorem is a tool that updates probability of an event in the light of an evidence. It is written in various forms. All they are equivalent. Let us again consider partition of outcome space – B1,B2,,,,Bn so that they are mutually exclusive and sum of them is equal to . Then for one of these events (say j-th event) we can write: Usually P(Bj|A) is called posterior probability, P(Bj) is prior probability and P(A|Bj) is likelihood. It is widely used in statistical inferences. Example: A box contains four balls. There are two possibilities: a) all balls are white (B1) b) two white and two red (B2). A ball is drawn and it is white (event A). What is the probability that all balls are white. B1 (all white) and B2 (two white and two red) are two possible outcomes with prior probabilities ½. If B1 is true then probability of A is 1 and if B2 is true then probability of A is ½. Calculate P(B1|A). What is probability P(B2|A)? Bayes’s postulate: If there is no prior information available then prior probabilities should be assumed to be equal.

Random variables Random variable is a function from outcome space to the real line X:   R Example: Consider random experiment of tossing a coin twice. The outcome space is: ={(H,H),(H,T),(T,H),(T,T)}) Define a random variable as X((T,T)) = 0, X((H,T))=X((T,H)) = 1, X((H,H))=2 Example 2: Rolling a die. Outcome space {1,2,3,4,5,6). Define a random variable X(j) = j.

Probability distribution function Discrete case (the number of elements in outcome space is finite or countable infinite): Probability function p assigns for each possible realisation x of a random variable X a probability p(x) = P(X=x). Obviously xp(x) = 1. Example: The number of heads turning up in two tosses is random variable with probability p(1) = 1/4, p(0) =1/2, p(2) =1/4. For continuous random variable it is not possible to define probability for each realisation since their probability is usually 0. For them it is easy to define a distribution function: F(x) = P(Xx) i.e. probability that X is less than or equal to x. F(x) has the following properties: 1) F(- ) = 0, 2) F(x) is a monotonic and increasing function, 3) F(+ ) = 1. This function is defined for discrete as well as continuous random variables. If the derivative of F(x) exists then it is called density of probability function – f(x) = dF(x)/dx. Another relation between these two functions is:

Cumulative and density of probability distribution Cumulative probability uniform distribution on the interval [0,1] Density of probability of uniform distribution on the interval [0,1] b)

Joint probability distributions If there are more than one random variable then their joint probability distribution is defined similarly. For discrete case: p(x,y) = P((X,Y)=(x,y)) = P(X=x,Y=y) Then xyp(x,y) = 1, p(x,y)0. The marginal probability function p(x) is derived by summing over all possible values of y pX(x) = yp(x,y) Conditional probability function of X given Y=y is: p(x|y) = p(x,y)/pY(y) Definition for the joint probability distribution for continuous random variables is similar. F(x,y) = P(X  x,Yy). Probability density (f(x,y)) is derivative of the probability function with respect to its arguments. It has properties: Marginal and conditional probability densities are defined similar to discrete random variables by replacing summation with integration.

Joint probability distributions and independence Random events {X=x} and {Y=y} are independent if P(X=x, Y=y) = P(X=x)P(Y=y) The random variables are independent if for all pairs (x,y) this relation holds. It can also be written as p(x,y) = pX(x)pY(y) And then p(x|y) = pX(x) and p(y|x) = pY(y) For continuous random variables definition is analogous. It can be defined by replacing p with f everywhere. f(x,y) = fX(x)fY(y), f(x|y) = fX(x), f(y|x) = fY(y) Bayes’s theorem then becomes: f(x|y) = fX(x) f(y|x)/fY(y) Where f(x|y) is posterior probability density, f(x) is prior probability density f(y|x) is likelihood of y if x would be observed, f(y) can be considered as a normalisation coefficient. Usually subscripts X and Y are dropped.

Expectation values. Moments If X is a random variable and h(X) is its function then expectation value (discrete case) is defined as: E(h(X)) = xh(x)p(x) If h(x) = x then it is called the first moment. If h(x) = xn then it is called n-th moment. If h(x) = (x-E(X))n then it is called n-th central moment: The second central moment is called variance of the random variable. First moment and second central moment play important role in statistics and they have special symbols  - is also called as a standard deviation When there are more than one random variable and their joint probability function is known then their mixed moments also are defined. Most important of them is covariance and correlation: For continuous random variables expectation values, moments, covariance and correlation are defined similarly by replacing summation with integration. If random variables are independent then their covariance is 0. Reverse is not true in general

Examples Let us take example of tossing a coin. Coin is fair (i.e. probability of head is 0.5 and that of tail is 0.5). Define random variable X(H) = 0, X(T)=1. Then expectation value is: E(X)=0*P(X=H)+1*P(X=T)=0*0.5+1*0.5=0.5 E(X2)=02*P(X=H)+12*P(X=T)=0*0.5+1*0.5=0.5 E(X-E(X))2=(0-0.5)2*0.5+(1-0.5)2*0.5=0.25*0.5+0.25*0.5=0.25 The expectation (first moment) value is 0.5, second moment s 0.5 and standard deviation is 0.5. Let us take another example. Assume that the density of the probability distribution has the form (it is uniform distribution over the interval [0,1]): And the random variable is X(x)=x.

Further reading Berthold, M. and Hand, DJ (2003) “Intelligent data analysis” Feller, W. (1968) An Introduction to Probability Theory and Its Applications: v. 1 Feller, W. (1971) An Introduction to Probability Theory and Its Applications: v. 2 Mardia, KV, Kent, JT and Bibby, JM (2003) “Mutlivariate analysis” Jaynes, E. (2003) “The probability theory: Logic of science”