Random Variables and Probability Distributions

Random Variables and Probability Distributions
Chapter 6 Random Variables and Probability Distributions Created by Kathy Fritz

Consider the chance experiment of randomly selecting a customer who is leaving a store. One numerical variable of interest to the store manager might be the number of items purchased by the customer. Let’s use the letter x to denote this variable. In this example, the values of x are isolated points. Another variable of interest might be y = number of minutes spent in a checkout line. The possible y values form an entire interval on the number line.

Random Variables

Probability Model A probability model describes

Random Variable and Probability Model

Random Variable A random variable is a A random variable
A random variable is discrete if A random variable is continuous if

Identify the following variables as discrete or continuous
The number of items purchased by each customer The amount of time spent in the checkout line by each customer The weight of a pineapple The number of gas pumps in use

Probability Distributions for Discrete Random Variables
Properties

In Wolf City (a fictional place), regulations prohibit more than five dogs or cats per household.
Let x = the number of dogs or cats per household in Wolf City X

Discrete Probability Distribution
The probability distribution of a discrete random variable x Each probability is the Common ways to display a probability distribution for a discrete random variable are a table, probability histogram, or formula.

Properties of Discrete Probability Distributions
For every possible x value, 2) The sum of P(x) over all values of x

Discrete Probability Distributions
If we can find a way to list all the possible outcomes for a random variable and assign probabilities to each one, we have a discrete random variable.

Suppose that each of four randomly selected customers purchasing a refrigerator at an appliance store chooses either an energy-efficient model (E) or one from a less expensive group of models (G) that do not have an energy- efficient rating. Assume that these customers make their choices independently of one another and that 40% of all customers select an energy-efficient model. Consider the next four customers. Let: x = the number of energy efficient refrigerators purchased by the four customers x 1 2 3 4

Refrigerators continued
Refrigerators continued x = the number of energy efficient refrigerators purchased by the four customers

Refrigerators continued . . .
x 1 2 3 4 P(x) 0.1296 0.3456 0.1536 0.0256 The probability distribution can be used to determine probabilities of various events involving x. For example, the probability that at least two of the four customers choose energy-efficient models is Type equation here.

x 1 2 3 4 P(x) 0.1296 0.3456 0.1536 0.0256 What is the probability that more than two of the four customers choose energy-efficient models?

x 1 2 3 4 P(x) 0.1296 0.3456 0.1536 0.0256

Probability Distributions for Continuous Random Variables
Properties

Consider the random variable:
x = the weight (in pounds) of a full-term newborn child Suppose that weight is reported to the nearest pound. The following probability histogram displays the distribution of weights. Now suppose that weight is reported to the nearest 0.1 pound. This would be the probability histogram.

Probability Distributions for Continuous Variables
A probability distribution for a continuous random variable x The function that describes this curve is denoted The probability that x falls in any particular interval is the

Properties of continuous probability distributions
1. f(x) > 0 2. The total area

Suppose x has a probability distribution with density function:
Suppose x is a continuous random variable defined as the amount of time (in minutes) taken by a clerk to process a certain type of application form. Suppose x has a probability distribution with density function: The following is the graph of f(x), the density curve: Time (in minutes) Density

P(x ≥ 5.5) = Application Problem Continued . . .
What is the probability that it takes at least 5.5 minutes to process the application form? P(x ≥ 5.5) = Time (in minutes) Density

P(x = 5.5) = Application Problem Continued . . .
What is the probability that it takes exactly 5.5 minutes to process the application form? P(x = 5.5) = Time (in minutes) Density

P(x > 5.5) = Application Problem Continued . . .
What is the probability that it takes more than 5.5 minutes to process the application form? P(x > 5.5) = Time (in minutes) Density

x = package weight (in pounds)
Two hundred packages shipped using the Priority Mail rate for packages less than 2 pounds were weighed, resulting in a sample of 200 observations of the variable x = package weight (in pounds) from the population of all Priority Mail packages under 2 pounds. A histogram (using the density scale, where height = (relative frequency)/(interval width)) of 200 weights is shown below.

x = package weight (in pounds)
Two hundred packages shipped using the Priority Mail rate for packages less than 2 pounds were weighed, resulting in a sample of 200 observations of the variable x = package weight (in pounds) from the population of all Priority Mail packages under 2 pounds. What proportion of the packages weigh over 1.5 pounds? h = 0.75 b = 1.5

Students at a university use an online registration system to register for courses. The variable
x = length of time (in minutes) required for a student to register was recorded for a large number of students using the system. The resulting values were used to construct a probability histogram (below).

Some density curves resemble the one below
Some density curves resemble the one below. Integral calculus is used to find the area under these curves.

The probability that a continuous random variable x lies between a lower limit a and an upper limit b is = -

Mean and Standard Deviation of a Random Variable
Of Discrete Random Variables Of Continuous Random Variables

Means and Standard Deviations of Probability Distributions
The mean value of a random variable x, The standard deviation of a random variable x,

Mean Value for a Discrete Random Variable
The mean value of a discrete random variable x, denoted by mx , The term expected value is sometimes used in place of mean value and E(x) is another way to denote mx .

x = the number of attempts made by a randomly selected applicant
Individuals applying for a certain license are allowed up to four attempts to pass the licensing exam. Consider the random variable x = the number of attempts made by a randomly selected applicant The probability distribution of x is as follows: x 1 2 3 4 p(x) 0.10 0.20 0.30 0.40 Then x has mean value

Standard Deviation for a Discrete Random Variable
The variance of a discrete random variable x, denoted by 𝜎 𝑥 2 , is computed by The standard deviation of x, denoted by sx, is the square root of the variance.

x = the number of attempts made by a randomly selected applicant
Revisit the license example . . . x = the number of attempts made by a randomly selected applicant The probability distribution of x is as follows: Then x has variance The standard deviation of x is x 1 2 3 4 p(x) 0.10 0.20 0.30 0.40

Binomial and Geometric Distributions
Properties of Binomial Distributions Mean of Binomial Distributions Standard Deviation of Binomial Distributions Properties of Geometric Distributions

Suppose we decide to record the gender of the next 25 newborns at a particular hospital.
What is the chance that at least 15 are female? What is the chance that between 10 and 15 are female? Out of the 25 newborns, how many can we expect to be female?

B I N S Binomial Settings Definition:
When the same chance process is repeated several times, we are often interested in whether a particular outcome does or doesn’t happen on each repetition. In some cases, the number of repeated trials is fixed in advance and we are interested in the number of times a particular event (called a “success”) occurs. If the trials in these cases are independent and each success has an equal chance of occurring, we have a binomial setting. Definition: B I N S

Binomial Random Variable
Consider tossing a coin n times. Each toss gives either heads or tails. Knowing the outcome of one toss does not change the probability of an outcome on any other toss. If we define heads as a success, then p is the probability of a head and is 0.5 on any toss. The number of heads in n tosses is a binomial random variable X. The probability distribution of X is called a binomial distribution. Definition: The count X of successes in a binomial setting is a binomial random variable. The probability distribution of X is a binomial distribution with parameters n and p, where n is the number of trials of the chance process and p is the probability of a success on any one trial. The possible values of X are the whole numbers from 0 to n. Note: When checking the Binomial condition, be sure to check the BINS and make sure you’re being asked to count the number of successes in a certain number of trials!

Binomial Probabilities
In a binomial setting, we can define a random variable (say, X) as the number of successes in n independent trials. We are interested in finding the probability distribution of X. Each child of a particular pair of parents has probability 0.25 of having type O blood. Genetics says that children receive genes from each of their parents independently. If these parents have 5 children, the count X of children with type O blood is a binomial random variable with n = 5 trials and probability p = 0.25 of a success on each trial. In this setting, a child with type O blood is a “success” (S) and a child with another blood type is a “failure” (F). What’s P(X = 2)? P(SSFFF) = However, there are a number of different arrangements in which 2 out of the 5 children have type O blood: SSFFF SFSFF SFFSF SFFFS FSSFF FSFSF FSFFS FFSSF FFSFS FFFSS Therefore, P(X = 2) =

Binomial Probability Formula:
Let n = number of independent trials in a binomial experiment p = constant probability that any particular trial results in a success Then

Sixty percent of all computers sold by a large computer retailer are laptops and 40% are desktop models. The type of computer purchased by each of the next 12 customers will be recorded. Define the random variable of interest as x = the number of laptops among these 12 The binomial random variable x counts the number of laptops purchased. The purchase of a laptop is considered a success and is denoted by S. The probability distribution of x is given by

What is the probability that exactly four of the next 12 computers sold are laptops?
If many groups of 12 purchases are examined, about 4.2% of them include exactly four laptops.

What is the probability that between four and seven (inclusive) are laptops?

What is the probability that between four and seven (exclusive) are laptops?

Formulas for mean and standard deviation of a binomial distribution

Let’s revisit the computer example:
Sixty percent of all computers sold by a large computer retailer are laptops and 40% are desktop models. The type of computer purchased by each of the next 12 customers will be recorded. Define the random variable of interest as x = the number of laptops among these 12 Compute the mean and standard deviation for the binomial distribution of x.

Sampling Without Replacement Condition
Binomial Distributions in Statistical Sampling The binomial distributions are important in statistics when we want to make inferences about the proportion p of successes in a population. Suppose 10% of CDs have defective copy-protection schemes that can harm computers. A music distributor inspects an SRS of 10 CDs from a shipment of 10,000. Let X = number of defective CDs. What is P(X = 0)? Note, this is not quite a binomial setting. Why? The actual probability is In practice, the binomial distribution gives a good approximation as long as we don’t sample more than 10% of the population. Using the binomial distribution, When taking an SRS of size n from a population of size N, we can use a binomial distribution to model the count of successes in the sample as long as Sampling Without Replacement Condition

Computers Revisited . . . Suppose we were NOT interested in the number of laptops purchased by the next 12 customers, but which of the next customers would be the first one to purchase a laptop.

Geometric Random Variable
In a geometric setting, if we define the random variable Y to be the number of trials needed to get the first success, then Y is called a geometric random variable. The probability distribution of Y is called a geometric distribution. Definition: The number of trials Y that it takes to get a success in a geometric setting is a geometric random variable. The probability distribution of Y is a geometric distribution with parameter p, the probability of a success on any trial. The possible values of Y are 1, 2, 3, …. Note: Like binomial random variables, it is important to be able to distinguish situations in which the geometric distribution does and doesn’t apply!

Properties of a Geometric Experiment
Suppose an experiment consists of a sequence of trials with the following conditions: The trials are Each trial can result in The probability of success A geometric random variable is defined as x = number of trials until the first success is observed (including the success trial) The probability distribution of x is called the geometric probability distribution.

Suppose that 40% of students who drive to campus at your school or university carry jumper cables.
Your car has a dead battery and you don’t have jumper cables, so you decide to stop students as they are headed to the parking lot and ask them whether they have a pair of jumper cables. Let: x = the number of students stopped before finding one with a pair of jumper cables This is an example of a geometric random variable.

Geometric Probability Distribution
If x is a geometric random variable with probability of success = p for each trial, then Where x = 1, 2, 3, …

Jumper Cables Continued . . .
Let: x = the number of students stopped before finding one with a pair of jumper cables Recall that p = .4 What is the probability that third student stopped will be the first student to have jumper cables? What is the probability that three or fewer students are stopped before finding one with jumper cables? p(3) = P(x < 3) = p(1) + p(2) + p(3) =

Normal Distributions Standard Normal Curve
Using a Table to Calculate Probabilities Other Normal Curves

Normal Distributions . . . are bell shaped and continuous
are continuous distributions approximate the distributions of many different variables are distinguished from one another by their mean m and standard deviation s are used in inferential procedures have an area under the curve equal to 1

Normal Distributions . . .

Standard Normal Distribution . . .
The standard normal distribution is the normal distribution with m = 0 and s = 1

Using the Table of Standard Normal Curve Areas
For any number z*, from to 3.89 and rounded to two decimal places, the Appendix Table 2 gives (area under z curve to the left of z*) = P(z < z*) = P(z < z*) Where the letter z is used to represent a random variable whose distribution is the standard normal distribution.

Suppose we are interested in the probability that z is less than 1.42.
P(z < 1.42) = P(z < 1.42) 1.42 … z* .00 .01 .02 .03 1.3 .9032 .9049 .9066 .9082 1.4 .9192 .9207 .9222 .9236 1.5 .9332 .9345 .9357 .9370

Suppose we are interested in the probability that z* is less than 0.58.
P(z < 0.58) = P(z < 0.58) … z* .07 .08 .09 0.4 .6808 .6844 .6879 0.5 .7157 .7190 .7224 0.6 .7486 .7517 .7549

P(-1.76 < z < 0.58) = P(z < 0.58) - P(z < -1.76)
Find the following probability: P(-1.76 < z < 0.58) = P(z < 0.58) - P(z < -1.76)

Suppose we are interested in the probability that z. is greater than 2
P(z > 2.31) = … z* .00 .01 .02 2.2 .9861 .9864 .9868 .9871 2.3 .9893 .9896 .9898 .9901 2.4 .9918 .9920 .9922 .9925

Suppose we are interested in the finding the z* for the smallest 2%.
P(z < z*) = .02 z* … z* .03 .04 .05 -2.1 .0162 .0158 .0154 -2.0 .0207 .0202 .0197 -1.9 .0262 .0256 .0250 …

Normal Distribution Calculations
How to Solve Problems Involving Normal Distributions State: Plan: Do: Conclude:

Finding Probabilities for Other Normal Curves
To find the probabilities for other normal curves, standardize the relevant values and then use the table of z areas. If x is a random variable whose behavior is described by a normal distribution with mean m and standard deviation s , then P(x < b) = P(z < b*) P(x > a) = P(z > a*) P(a < x < b) = P(a* < z < b*) Where z is a variable whose distribution is standard normal and

Data on the length of time to complete registration for classes using an on-line registration system suggest that the distribution of the variable x = time to register for students at a particular university can well be approximated by a normal distribution with mean m = 12 minutes and standard deviation s = 2 minutes. What is the probability that it will take a randomly selected student less than 9 minutes to complete registration?

P(x > 13) = Registration Problem Continued . . .
x = time to register m = 12 minutes and s = 2 minutes What is the probability that it will take a randomly selected student more than 13 minutes to complete registration? 13 P(x > 13) =

P(7 < x < 15) = Registration Problem Continued . . .
x = time to register m = 12 minutes and s = 2 minutes What is the probability that it will take a randomly selected student between 7 and 15 minutes to complete registration? 7 15 P(7 < x < 15) =

Normal Approximation for Binomial Distributions
As n gets larger, something interesting happens to the shape of a binomial distribution. The figures below show histograms of binomial distributions for different values of n and p. What do you notice as n gets larger? Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation As a rule of thumb, we will use the Normal approximation when n is so large that np ≥ 10 and n(1 – p) ≥ 10. That is, the expected number of successes and failures are both at least 10. Normal Approximation for Binomial Distributions

Example: Attitudes Toward Shopping
Sample surveys show that fewer people enjoy shopping than in the past. A survey asked a nationwide random sample of 2500 adults if they agreed or disagreed that “I like buying new clothes, but shopping is often frustrating and time-consuming.” Suppose that exactly 60% of all adult US residents would say “Agree” if asked the same question. Let X = the number in the sample who agree. Estimate the probability that 1520 or more of the sample agree. 1) Verify that X is approximately a binomial random variable. B: Success = agree, Failure = don’t agree I: Because the population of U.S. adults is greater than 25,000, it is reasonable to assume the sampling without replacement condition is met. N: n = 2500 trials of the chance process S: The probability of selecting an adult who agrees is p = 0.60 2) Check the conditions for using a Normal approximation. Since np = 2500(0.60) = 1500 and n(1 – p) = 2500(0.40) = 1000 are both at least 10, we may use the Normal approximation. 3) Calculate P(X ≥ 1520) using a Normal approximation.

Ways to Assess Normality
Normal Probability Plot Using Correlation Coefficient

Assessing Normality The Normal Distribution provides a good model for some distributions of real data. Many statistical inference procedures are based on the assumption that the population is approximately Normal. So we need a strategy to assess normality. A. Plot the data. Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell-shaped. B. Check whether the data follow the Empirical Rule. Count how many observations fall within one, two, and three standard deviations of the mean and check to see if these percents are close to 68%, 95%, and 99.7%.

Normal Probability Plot
A normal probability plot is a scatterplot of A strong linear pattern in a normal probability plot On the other hand, systematic departure from a straight-line pattern (such as curvature in the plot)

The following data represent egg weights (in grams) for a sample of 10 eggs.

Using the Correlation Coefficient to Assess Normality
The correlation coefficient, r, can be calculated for the n (normal score, observed value) pairs. If r is too much smaller than 1, then normality of the underlying distribution is questionable. Consider these points from the weight of eggs data: (-1.539, 52.53) (-1.001, 52.66) (-.656,52.86) (-.376,53.00) (-.123, 53.04) (.123,53.07) (.376,53.16) (.656,53.23) (1.001,53.26) (1.539,53.50) Calculate the correlation coefficient for these points. Values to Which r Can be Compared to Check for Normality n 5 10 15 20 25 30 40 50 60 75 Critical r .832 .880 911 .929 .941 .949 .960 .966 .971 .976 r = .986

Random Variables and Probability Distributions

Similar presentations

Presentation on theme: "Random Variables and Probability Distributions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Random Variables and Probability Distributions

Similar presentations

Presentation on theme: "Random Variables and Probability Distributions"— Presentation transcript:

Similar presentations

About project

Feedback