PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology The Weak Law and the Strong.

Slides:



Advertisements
Similar presentations
Chapter 2 Concepts of Prob. Theory
Advertisements

Chapter 12 Probability © 2008 Pearson Addison-Wesley. All rights reserved.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.
COUNTING AND PROBABILITY
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
13. The Weak Law and the Strong Law of Large Numbers
Probability theory 2011 Convergence concepts in probability theory  Definitions and relations between convergence concepts  Sufficient conditions for.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Copyright © Cengage Learning. All rights reserved. CHAPTER 9 COUNTING AND PROBABILITY.
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Stat 1510: Introducing Probability. Agenda 2  The Idea of Probability  Probability Models  Probability Rules  Finite and Discrete Probability Models.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Chapter 5 Sampling Distributions
P. STATISTICS LESSON 7.2 ( DAY 2)
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Independence and Bernoulli.
EXIT NEXT Click one of the buttons below or press the enter key BACKTOPICSProbability Mayeen Uddin Khandaker Mayeen Uddin Khandaker Ph.D. Student Ph.D.
Copyright © Cengage Learning. All rights reserved. CHAPTER 9 COUNTING AND PROBABILITY.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
Worked examples and exercises are in the text STROUD PROGRAMME 28 PROBABILITY.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
1 TABLE OF CONTENTS PROBABILITY THEORY Lecture – 1Basics Lecture – 2 Independence and Bernoulli Trials Lecture – 3Random Variables Lecture – 4 Binomial.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Computing Fundamentals 2 Lecture 6 Probability Lecturer: Patrick Browne
Independence and Bernoulli Trials. Sharif University of Technology 2 Independence  A, B independent implies: are also independent. Proof for independence.
In section 11.9, we were able to find power series representations for a certain restricted class of functions. Here, we investigate more general problems.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
1 3. Random Variables Let ( , F, P) be a probability model for an experiment, and X a function that maps every to a unique point the set of real numbers.
1 3. Random Variables Let ( , F, P) be a probability model for an experiment, and X a function that maps every to a unique point the set of real numbers.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
12 INFINITE SEQUENCES AND SERIES. In general, it is difficult to find the exact sum of a series.  We were able to accomplish this for geometric series.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Chapter 4-5 DeGroot & Schervish. Conditional Expectation/Mean Let X and Y be random variables such that the mean of Y exists and is finite. The conditional.
Week 121 Law of Large Numbers Toss a coin n times. Suppose X i ’s are Bernoulli random variables with p = ½ and E(X i ) = ½. The proportion of heads is.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Lecture 6 Dustin Lueker.  Standardized measure of variation ◦ Idea  A standard deviation of 10 may indicate great variability or small variability,
Probability and Moment Approximations using Limit Theorems.
Joint Moments and Joint Characteristic Functions.
One Function of Two Random Variables
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Chebyshev’s Inequality Markov’s Inequality Proposition 2.1.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
1 What Is Probability?. 2 To discuss probability, let’s begin by defining some terms. An experiment is a process, such as tossing a coin, that gives definite.
Basic statistics Usman Roshan.
Virtual University of Pakistan
3. Random Variables (Fig.3.1)
What Is Probability?.
Probability Imagine tossing two coins and observing whether 0, 1, or 2 heads are obtained. It would be natural to guess that each of these events occurs.
Basic statistics Usman Roshan.
Meaning of Probability
13. The Weak Law and the Strong Law of Large Numbers
Tutorial 10: Limit Theorems
3. Random Variables Let (, F, P) be a probability model for an experiment, and X a function that maps every to a unique point.
Independence and Counting
13. The Weak Law and the Strong Law of Large Numbers
Presentation transcript:

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology The Weak Law and the Strong Law of Large Numbers

Timeline…  Bernoulli, weak law of large numbers (WLLN)  Poisson, generalized Bernoulli’s theorem  Tchebychev, discovered his method  Markov, used Tchebychev’s reasoning to extend Bernoulli’s theorem to dependent random variables as well.  Borel, the strong law of large numbers that further generalizes Bernoulli’s theorem.  Kolmogorov, necessary and sufficient conditions for a set of mutually independent r.vs

Weak Law of Large Numbers  s: i.i.d Bernoulli r.vs such that  : the number of “successes”in n trials.  Then the weak law due to Bernoulli states that  i.e., the ratio “total number of successes to the total number of trials” tends to p in probability as n increases.

Strong Law of Large Numbers Borel and Cantelli:  this ratio k / n tends to p not only in probability, but with probability 1.  This is the strong law of large numbers (SLLN).

 What is the difference?  SLLN states that if is a sequence of positive numbers converging to zero, then  From Borel-Cantelli lemma, when this formula is satisfied the events can occur only for a finite number of indices n in an infinite sequence,  or equivalently, the events occur infinitely often, i.e., the event k / n converges to p almost-surely.

Proof Since we have and hence where

Proof – continued  since  So we obtain can coincide with j, k or l, and the second variable takes (n-1) values 0

Proof – continued  Let so that the above integral reads  and hence  thus proving the strong law by exhibiting a sequence of positive numbers that converges to zero and satisfies

What is the difference?  The weak law states that for every n that is large enough, the ratio is likely to be near p with certain probability that tends to 1 as n increases.  It does not say that k / n is bound to stay near p if the number of trials is increased.  Suppose is satisfied for a given in a certain number of trials  If additional trials are conducted beyond the weak law does not guarantee that the new k / n is bound to stay near p for such trials.  There can be events for which for in some regular manner.

What is the difference?  The probability for such an event is the sum of a large number of very small probabilities  The weak law is unable to say anything specific about the convergence of that sum.  However, the strong law states that not only all such sums converge, but the total number of all such events:  where is in fact finite!

Bernstein’s inequality  This implies that the probability of the events as n increases becomes and remains small,  since with probability 1 only finite violations to the above inequality takes place as  It is possible to arrive at the same conclusion using a powerful bound known as Bernstein’s inequality that is based on the WLLN.

Bernstein’s inequality  Note that  and for any this gives  Thus

Bernstein’s inequality  Since for any real x,  We can obtain

Bernstein’s inequality  But is minimum for and hence  and hence we obtain Bernstein’s inequality  This is more powerful than Tchebyshev’s inequality as it states that the chances for the relative frequency k / n exceeding its probability p tends to zero exponentially fast as Similarly,

 Chebyshev’s inequality gives the probability of k / n to lie between and for a specific n.  We can use Bernstein’s inequality to estimate the probability for k / n to lie between and for all large n.  Towards this, let  so that  To compute the probability of note that its complement is given by

 We have  This gives  or,  Thus k / n is bound to stay near p for all large enough n, in probability, as stated by the SLLN.

Let Thus if we toss a fair coin 1,000 times, from the weak law  Thus on the average 39 out of 40 such events each with 1000 or more trials will satisfy the inequality  It is quite possible that one out of 40 such events may not satisfy it.  Continuing the experiment for 1000 more trials, with k successes out of n, for it is quite possible that for few such n the above inequality may be violated. Discussion

 This is still consistent with the weak law  but according to the strong law such violations can occur only a finite number of times each with a finite probability in an infinite sequence of trials,  hence almost always the above inequality will be satisfied, i.e., the sample space of k / n coincides with that of p as Discussion - continued

 2 n red cards and 2 n black cards (all distinct) are shuffled together to form a single deck, and then split into half.  What is the probability that each half will contain n red and n black cards?  From a deck of 4 n cards, 2 n cards can be chosen in different ways.  Consider the unique draw consisting of 2 n red cards and 2 n black cards in each half. Example Solution

 Among those 2 n red cards, n of them can be chosen in different ways;  Similarly for each such draw there are ways of choosing n black cards.  Thus the total number of favorable draws containing n red and n black cards in each half are among a total of draws.  This gives the desired probability to be Example – continued

 For large n, using Stirgling’s formula we get  For a full deck of 52 cards, we have which gives  For a partial deck of 20 cards, we have and Example – continued

An Experiment  20 cards were given to a 5 year old child to split them into two equal halves  the outcome was declared a success if each half contained exactly 5 red and 5 black cards.  With adult supervision (in terms of shuffling) the experiment was repeated 100 times that very same afternoon. The results are tabulated below in next slides.

ExptNumber of successes ExptNumber of successes ExptNumber of successes ExptNumber of successes ExptNumber of successes

Results of an experiment of 100 trials This figure shows the convergence of k/n to p.