Random Variables & Probability Distributions

Slides:



Advertisements
Similar presentations
Chapter 7 Discrete Distributions. Random Variable - A numerical variable whose value depends on the outcome of a chance experiment.
Advertisements

Note 7 of 5E Statistics with Economics and Business Applications Chapter 5 The Normal and Other Continuous Probability Distributions Normal Probability.
Linear Transformation and Statistical Estimation and the Law of Large Numbers Target Goal: I can describe the effects of transforming a random variable.
Chapter 6 The Normal Distribution and Other Continuous Distributions
12.3 – Measures of Dispersion
Chapter 7 Continuous Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements.
8.5 Normal Distributions We have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join.
Random Variables and Probability Distributions
Chapter 6: Probability Distributions
Chapter 7 Random Variables & Probability Distributions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Chapter 7 Lesson 7.5 Random Variables and Probability Distributions
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Random Variables & Probability Distributions.
Chapter 7 Continuous Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements.
DISCRETE PROBABILITY DISTRIBUTIONS
The Practice of Statistics Third Edition Chapter 8: The Binomial and Geometric Distributions 8.1 The Binomial Distribution Copyright © 2008 by W. H. Freeman.
Normal Probability Distributions Chapter 5. § 5.1 Introduction to Normal Distributions and the Standard Distribution.
Chapter 7 Lesson 7.3 Random Variables and Probability Distributions 7.3 Probability Distributions for Continuous Random Variables.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Probability Distributions. Statistical Experiments – any process by which measurements are obtained. A quantitative variable x, is a random variable if.
Basic Business Statistics
Continuous Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.
Chapter 7 Random Variables and Discrete Distributions.
Chapter 7 Random Variables and Continuous Distributions.
Theoretical distributions: the Normal distribution.
Random Variables and Probability Distribution How can a sampling be used to determine critical information relating to the population when considering.
Chapter 6 The Normal Distribution and Other Continuous Distributions
13-5 The Normal Distribution
Discrete Distributions
Normal Probability Distributions
Normal Probability Distributions Chapter 5. § 5.1 Introduction to Normal Distributions and the Standard Distribution.
Continuous Distributions
Normal Probability Distributions
Random Variables and Probability Distributions
Chapter 7 Lesson 7.5 Random Variables and Probability Distributions
Section 7.3: Probability Distributions for Continuous Random Variables
Binomial and Geometric Random Variables
Normal Probability Distributions
Analysis of Economic Data
Continuous Distributions
Chapter 6. Continuous Random Variables
The Normal Distribution
Discrete Distributions
STAT 206: Chapter 6 Normal Distribution.
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 6: Random Variables
The Normal Probability Distribution
Chapter 16.
Continuous Distributions
Chapter 5 Sampling Distributions
The normal distribution
6.1: Discrete and Continuous Random Variables
Random Variables Binomial Distributions
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION
Discrete Distributions
8.1 The Binomial Distribution
Continuous Random Variables
Sampling Distributions
Discrete Distributions
Chapter 6: Random Variables
Continuous Distributions
Discrete Distributions.
Lecture 12: Normal Distribution
Discrete Distributions
The Normal Distribution
Chapter 5: Sampling Distributions
Presentation transcript:

Random Variables & Probability Distributions Chapter 7 Random Variables & Probability Distributions

Section 7.1 Random Variables

Definition of Random Variable A numerical variable whose value depends on the outcome of a chance experiment Associates a numerical value with each outcome of a chance experiment Two types of random variables Discrete Continuous

Examples of Random Variables A grocery store manager might be interested in the number of broken eggs in each carton (dozen of eggs). An environmental scientist might be interested in the amount of ozone in an air sample. Since these values change and are subject to some uncertainty, these are examples of random variables.

Two Types of Random Variables: Discrete – Its set of possible values is a collection of isolated points along a number line. This is typically a count of something. Continuous -- Its set of possible values includes an entire interval on a number line. This is typically a measure of something.

Examples Experiment: A fair die is rolled Random Variable: The number on the up face Type: Discrete Experiment: A pair of fair dice are rolled Random Variable: The sum of the up faces Random Variable: The smaller of the up faces

Examples 3. Experiment: A coin is tossed until the 1st head turns up Random Variable: The number of the toss that the 1st head turns up Type: Discrete Experiment: Choose and inspect a number of parts Random Variable: The number of defective parts

Examples Experiment: Measure the voltage in an outlet in your room Random Variable: The voltage Type: Continuous 6. Experiment: Observe the amount of time it takes a bank teller to serve a customer Random Variable: The time

Examples 7. Experiment: Measure the time until the next customer arrives at a customer service window Random Variable: The time Type: Continuous 8. Experiment: Inspect a randomly chosen circuit board from a production line Random Variable: 1, if the circuit board is defective 0, if the circuit board is not defective Type: Discrete

Probability Distributions for Discrete Random Variables Section 7.2 Probability Distributions for Discrete Random Variables

probability distribution--a model that describes the long-run behavior of a variable.

In Wolf City (a fictional place), regulations prohibit no more than five dogs or cats per household. Let x = the number of dogs and cats in a randomly selected household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 . 21 .13 .06 .03 This is called a discrete probability distribution. It can also be displayed in a histogram with the probability on the vertical axis. What do you notice about the sum of these probabilities? Is this variable discrete or continuous? What are the possible values for x? The Department of Animal Control has collected data over the course of several years. They have estimated the long-run probabilities for the values of x. Number of Pets Probability

Discrete Probability Distribution Gives the probabilities associated with each possible x value Each probability is the long-run relative frequency of occurrence of the corresponding x-value when the chance experiment is performed a very large number of times Usually displayed in a table, but can be displayed with a histogram or formula

Properties of Discrete Probability Distributions 1) For every possible x value, 0 < P(x) < 1 2) For all values of x, P(S) = P(x1) + P(x2) + ... + P(xk) = 1

Just add the probabilities for 0, 1, and 2 Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 .21 .13 .06 .03 What is the probability that a randomly selected household in Wolf City has at most 2 pets? Just add the probabilities for 0, 1, and 2 What does this mean? P(x < 2) = .26 + .31 + .21 = .78

Notice that this probability does NOT include 2! Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 .21 .13 .06 .03 What is the probability that a randomly selected household in Wolf City has less than 2 pets? Notice that this probability does NOT include 2! What does this mean? P(x < 2) = .26 + .31 = .57

Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 .21 .13 .06 .03 What is the probability that a randomly selected household in Wolf City has more than 1 but no more than 4 pets? When calculating probabilities for discrete random variables, you MUST pay close attention to whether certain values are included (< or >) or not included (< or >) in the calculation. What does this mean? P(1 < x < 4) = .21 + .13 + .06 = .40

Salesman Example x 1 2 3 4 5 6 p(x) 0.20 0.35 0.15 0.12 0.10 0.05 0.03 The number of items a given salesman sells per customer is x. The probability distribution of X is given below: x 1 2 3 4 5 6 p(x) 0.20 0.35 0.15 0.12 0.10 0.05 0.03

Salesman - continued x 1 2 3 4 5 6 p(x) 0.20 0.35 0.15 0.12 0.10 0.05 1 2 3 4 5 6 p(x) 0.20 0.35 0.15 0.12 0.10 0.05 0.03 The probability that he sells at least three items to a randomly selected customer is P(X ≥ 3) = 0.12 + 0.10 + 0.05 + 0.03 = 0.30 The probability that he sells at most three items to a randomly selected customer is P(X ≤ 3) = 0.20 + 0.35 +0.15 + 0.12 = 0.82 The probability that he sells between 2 and 4 (inclusive) items to a randomly selected customer is P(2 ≤ X ≤ 4) = 0.15 + 0.12 + 0.10 = 0.37

Probability Histogram A probability histogram has its vertical scale adjusted in a manner that makes the area associated with each bar equal to the probability of the event that the random variable takes on the value describing the bar.

Probability Distributions for Continuous Random Variables Section 7.3 Probability Distributions for Continuous Random Variables

Probability Distribution for a Continuous Random Variable A probability distribution for a continuous random variable x is specified by a mathematical function denoted by f(x) which is called the density function. The graph of a density function is a smooth curve called the density curve. The probability that x falls in any particular interval is the area under the density curve that lies above the interval.

Properties of continuous probability distributions 1. f(x) > 0 (the curve cannot dip below the horizontal axis) 2. The total area under the density curve equals one.

Consider the random variable: x = the weight (in pounds) of a full-term newborn child Suppose that weight is reported to the nearest pound. The following probability histogram displays the distribution of weights. Now suppose that weight is reported to the nearest 0.1 pound. This would be the probability histogram. What type of variable is this? If weight is measured with greater and greater accuracy, the histogram approaches a smooth curve. What is the sum of the areas of all the rectangles? The area of the rectangle centered over 7 pounds represents the probability 6.5 < x < 7.5 This is an example of a density curve. Notice that the rectangles are narrower and the histogram begins to have a smoother appearance. The shaded area represents the probability 6 < x < 8.

Let x denote the amount of gravel sold (in tons) during a randomly selected week at a particular sales facility. Suppose that the density curve has a height f(x) above the value x, where The density curve is shown in the figure:

P(x < ½) = 1 – ½(0.5)(1) = .75 Gravel problem continued . . . What is the probability that at most ½ ton of gravel is sold during a randomly selected week? This area can be found by using the formula for the area of a trapezoid: P(x < ½) = 1 – ½(0.5)(1) = .75 The probability would be the shaded area under the curve and above the interval from 0 to 0.5. Or, more easily, by finding the area of the triangle, and subtracting that area from 1.

P(x = ½) = Gravel problem continued . . . What is the probability that exactly ½ ton of gravel is sold during a randomly selected week? P(x = ½) = How do we find the area of a line segment? The probability would be the area under the curve and above 0.5. Since a line segment has NO area, then the probability that exactly ½ ton is sold equals 0.

Does the probability change whether the ½ is included or not? Gravel problem continued . . . What is the probability that less than ½ ton of gravel is sold during a randomly selected week? P(x < ½) = P(x < ½) = 1 – ½(0.5)(1) = .75 Hmmm . . . This is different than discrete probability distributions where it does change the probability whether a value is included or not! Does the probability change whether the ½ is included or not?

The following is the graph of f(x), the density curve: Suppose x is a continuous random variable defined as the amount of time (in minutes) taken by a clerk to process a certain type of application form. Suppose x has a probability distribution with density function: The following is the graph of f(x), the density curve: Time (in minutes) Density

P(x > 5.5) = .5(.5) = .25 Application Problem Continued . . . What is the probability that it takes more than 5.5 minutes to process the application form? P(x > 5.5) = .5(.5) = .25 When the density is constant over an interval (resulting in a horizontal density curve), the probability distribution is called a uniform distribution. Find the probability by calculating the area of the shaded region (base × height). Time (in minutes) Density

Other Density Curves Some density curves resemble the one below. Integral calculus is used to find the area under the these curves. Don’t worry – we will use tables (with the values already calculated). We can also use calculators or statistical software to find the area.

Mean & Standard Deviation of a Random Variable Section 7.4 Mean & Standard Deviation of a Random Variable

Means and Standard Deviations of Probability Distributions The mean value of a random variable x, denoted by mx, describes where the probability distribution of x is centered. The standard deviation of a random variable x, denoted by sx, describes variability in the probability distribution. When sx is small, observed values of x will tend to be close to the mean value and When sx is large, there will be more variability in observed values.

Illustrations Two distributions with the same standard deviation with different means. Larger mean

Illustrations Two distributions with the same means and different standard deviation. Smaller standard deviation

Mean and Variance for Discrete Probability Distributions Mean is sometimes referred to as the expected value (denoted E(x)). Variance is calculated using Standard deviation is the square root of the variance.

Calculating Mean A professor regularly gives multiple choice quizzes with 5 questions. Over time, he has found the distribution of the number of wrong answers on his quizzes is as follows

Calculating Mean Multiply each x value by its probability and add the results to get mx. mx = 1.41

Calculating Variance & Standard Deviation

mx = 1.51 pets Dogs and Cats Revisited . . . Let x = the number of dogs and cats in a randomly selected household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 .21 .13 .06 .03 What is the mean number of pets per household in Wolf City? xP(x) 0 .31 .42 .39 .24 .15 First multiply each x-value times its corresponding probability. Next find the sum of these values. mx = 1.51 pets

Next multiply by the corresponding probability. Then add these values. Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 .21 .13 .06 .03 What is the standard deviation of the number of pets per household in Wolf City? This is the variance – take the square root of this value to find the standard deviation. Next multiply by the corresponding probability. Then add these values. First find the deviation of each x-value from the mean. Then square these deviations. sx2 = (0-1.51)2(.26) + (1-1.51)2(.31) + (2-1.51)2(.21) + (3-1.51)2(.13) + (4-1.51)2(.06) + (5-1.51)2(.03) = 1.7499 sx = 1.323 pets

Mean and Variance for Continuous Random Variables For continuous probability distributions, mx and sx can be defined and computed using methods from calculus. The mean value mx locates the center of the continuous distribution. The standard deviation, sx, measures the extent to which the continuous distribution spreads out around mx.

mx = 4650 pounds/inch2 sx = 200 pounds/inch2 A company receives concrete of a certain type from two different suppliers. Let x = compression strength of a randomly selected batch from Supplier 1 y = compression strength of a randomly selected batch from Supplier 2 Suppose that mx = 4650 pounds/inch2 sx = 200 pounds/inch2 my = 4500 pounds/inch2 sy = 275 pounds/inch2 The first supplier is preferred to the second both in terms of mean value and variability. 4500 4300 4700 4900 my mx

The mean and standard deviation of the monthly salaries are What would happen to the mean and standard deviation if we had to deduct $100 from everyone’s salary because of business being bad? Suppose Wolf City Grocery had a total of 14 employees. The following are the monthly salaries of all the employees. The mean and standard deviation of the monthly salaries are mx = $1700 and sx = $603.56 Suppose business is really good, so the manager gives everyone a $100 raise per month. The new mean and standard deviation would be mx = $1800 and sx = $603.56 3500 1200 1900 1400 2100 1800 1300 1500 1700 2300 Let’s graph boxplots of these monthly salaries to see what happens to the distributions . . . What happened to the standard deviations? What happened to the means? We see that the distribution just shifts to the right 100 units but the spread is the same.

Notice that both the mean and standard deviation increased by 1.2. Wolf City Grocery Continued . . . mx = $1700 and sx = $603.56 Suppose the manager gives everyone a 20% raise - the new mean and standard deviation would be mx = $2040 and sx = $724.27 Let’s graph boxplots of these monthly salaries to see what happens to the distributions . . . Notice that multiplying by a constant stretches the distribution, thus, changing the standard deviation. Notice that both the mean and standard deviation increased by 1.2.

Mean and Standard Deviation of Linear functions If x is a random variable with mean, mx, and standard deviation, sx, and a and b are numerical constants, and the random variable y is defined by and

my = 50 + 1.8(318) = $622.40 What is the equation for y? Consider the chance experiment in which a customer of a propane gas company is randomly selected. Let x be the number of gallons required to fill a propane tank. Suppose that the mean and standard deviation is 318 gallons and 42 gallons, respectively. The company is considering the pricing model of a service charge of $50 plus $1.80 per gallon. Let y be the random variable of the amount billed. What is the equation for y? What are the mean and standard deviation for the amount billed? y = 50 + 1.8x my = 50 + 1.8(318) = $622.40 sy = 1.8(42) = $75.60

Sales Example Suppose x is the number of sales staff needed on a given day. If the cost of doing business on a day involves fixed costs of $255 and the cost per sales person per day is $110, find the mean cost (the mean of x or mx) of doing business on a given day where the distribution of x is given below. What is the equation for y?

Sales Example continued We need to find the mean of y = 255 + 110x

Sales Example continued We need to find the variance and standard deviation of y = 255 + 110x

List all the possible sums (A + B). 2 3 4 5 6 7 3 4 5 6 7 8 Find the mean and standard deviation for these sums. ? Move 1 s Suppose we are going to play a game called Stat Land! Players spin the two spinners below and move the sum of the two numbers. mA = 2.5 mB = 3.5 sA = 1.118 sB = 1.708 List all the possible sums (A + B). 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 1 2 3 4 Spinner A 5 6 Spinner B Not sure – let’s think about it and return in just a few minutes! Here are the mean and standard deviation for each spinner. Notice that the mean of the sums is the sum of the means! How are the standard deviations related? mA+B = 6 sA+B =2.041

List all the possible differences (B - A). ? Move 1 s Stat Land Continued . . . Suppose one variation of the game had players move the difference of the spinners. mA = 2.5 mB = 3.5 sA = 1.118 sB = 1.708 List all the possible differences (B - A). 0 -1 -2 -3 1 0 -1 -2 2 1 0 -1 3 2 1 0 4 3 2 1 5 4 3 2 1 2 3 4 Spinner A 5 6 Spinner B Find the mean and standard deviation for these differences. How do we find the standard deviation for the sums or differences? Notice that the mean of the differences is the difference of the means! WOW – this is the same value as the standard deviation of the sums! mB-A= 1 sB-A =2.041

Means and Variances for Linear Combinations If x1, x2,  , xn are random variables and a1, a2,  , an are numerical constants, the random variable y defined as y = a1x1 + a2x2 +  + anxn is a linear combination of the xi’s.

Mean and Standard Deviations for Linear Combinations If x1, x2, …, xn are random variables with means m1, m2, …, mn and variances s12, s22, …, sn2, respectively, and y = a1x1 + a2x2 + … + anxn then This result is true ONLY if the x’s are independent. This result is true regardless of whether the x’s are independent.

A commuter airline flies small planes between San Luis Obispo and San Francisco. For small planes the baggage weight is a concern. Suppose it is known that the variable x = weight (in pounds) of baggage checked by a randomly selected passenger has a mean and standard deviation of 42 and 16, respectively. Consider a flight on which 10 passengers, all traveling alone, are flying. The total weight of checked baggage, y, is y = x1 + x2 + … + x10

mx = m1 + m2 + … + m10 = 42 + 42 + … + 42 = 420 pounds Airline Problem Continued . . . mx = 42 and sx = 16 The total weight of checked baggage, y, is y = x1 + x2 + … + x10 What is the mean total weight of the checked baggage? mx = m1 + m2 + … + m10 = 42 + 42 + … + 42 = 420 pounds

To find the standard deviation, take the square root of this value. Airline Problem Continued . . . mx = 42 and sx = 16 The total weight of checked baggage, y, is y = x1 + x2 + … + x10 What is the standard deviation of the total weight of the checked baggage? Since the 10 passengers are all traveling alone, it is reasonable to think that the 10 baggage weights are unrelated and therefore independent. To find the standard deviation, take the square root of this value. sx2 = sx12 + sx22 + … + sx102 = 162 + 162 + … + 162 = 2560 pounds s = 50.596 pounds

Cereal Example Suppose “1 lb” boxes of Sugar Treats cereal have a weight distribution with a mean  T=1.050 lbs and standard deviation s T=0.051 lbs and “1 lb” boxes of Sour Balls cereal have a weight distribution with a mean B=1.090 lbs and standard deviation aB=0.087 lbs. If a promotion is held where the customer is sold a shrink wrapped package containing “1 lb” boxes of both Sugar Treats and Sour Balls cereals, what is the mean and standard deviation for the distribution of promotional packages?

Cereal Example continued Combining these values we get Combining these values we get

Fruit Example A distributor of fruit baskets is going to put 4 apples, 6 oranges and 2 bunches of grapes in his small gift basket. The weights, in ounces, of these items are the random variables x1, x2 and x3 respectively with means and standard deviations as given in the following table. Apples Oranges Grapes Mean m 8 10 7 Standard deviation s 0.9 1.1 2 Find the mean, variance and standard deviation of the random variable y = weight of fruit in a small gift basket.

Fruit Example continued It is reasonable in this case to assume that the weights of the different types of fruit are independent. Apples Oranges Mean m 8 10 7 Standard deviation s 0.9 1.1 2 Grapes

Binomial & Geometric Distributions Section 7.5 Binomial & Geometric Distributions

Special Distributions Two Discrete Distributions: Binomial Geometric One Continuous Distribution: Normal Distributions

These questions can be answered using a binomial distribution. Suppose we decide to record the gender of the next 25 newborns at a particular hospital. What is the chance that at least 15 are female? What is the chance that between 10 and 15 are female? Out of the 25 newborns, how many can we expect to be female? These questions can be answered using a binomial distribution.

Properties of a Binomial Experiment There are a fixed number of trials Each trial results in one of two mutually exclusive outcomes. (success/failure) Outcomes of different trials are independent The probability that a trial results in success is the same for all trials The binomial random variable x is defined as x = the number of successes observed when a binomial experiment is performed We use n to denote the fixed number of trials.

Are these binomial distributions? Toss a coin 10 times and count the number of heads Yes Deal 10 cards from a shuffled deck and count the number of red cards No, probability does not remain constant The number of tickets sold to children under 12 at a movie theater in a one hour period No, no fixed number

Binomial Probability Formula: Let n = number of independent trials in a binomial experiment p = constant probability that any trial results in a success Where: Appendix Table 9 can be used to find binomial probabilities. Technology, such as calculators and statistical software, will also perform this calculation.

Is this a binomial experiment? Instead of recording the gender of the next 25 newborns at a particular hospital, let’s record the gender of the next 5 newborns at this hospital. Is this a binomial experiment? Yes, if the births were not multiple births (twins, etc). Define the random variable of interest. x = the number of females born out of the next 5 births What are the possible values of x? x 0 1 2 3 4 5 What is the probability of “success”? What will the largest value of the binomial random variable be? Will a binomial random variable always include the value of 0?

Newborns Continued . . . What is the probability that exactly 2 girls will be born out of the next 5 births? What is the probability that less than 2 girls will be born out of the next 5 births?

What is the mean number of girls born in the next five births? Newborns Continued . . . Let’s construct the discrete probability distribution table for this binomial random variable: What is the mean number of girls born in the next five births? x 1 2 3 4 5 p(x) .03125 .15625 .3125 Notice that this is the same as multiplying n × p Since this is a discrete distribution, we could use: mx = 0(.03125) + 1(.15625) + 2(.3125) + 3(.3125) + 4(.15625) + 5(.03125) =2.5

Jury Example The adult population of a large urban area is 60% African-American. If a jury of 12 is randomly selected from the adults in this area, what is the probability that precisely 7 jurors are African-American? Clearly, n=12 and p = 0.60, so 0.2270 ) 4 (. 6 ! 5 7 12 ( p =

Jury Example - continued The adult population of a large urban area is 60% African-American. If a jury of 12 is randomly selected from the adults in this area, what is the probability that less than 3 are African-American? Clearly, n = 12 and p = 0.60, so P(x < 3) = P(0) + P(1) + P(2)

Solicitation Example On the average, 1 out of 19 people will respond favorably to a certain telephone solicitation. If 25 people are called, a) What is the probability that exactly two will respond favorably to this sales pitch? n = 25, p = 1/19

Solicitation Example continued On average, 1 out of 19 people will respond favorably to a certain telephone sales pitch. If 25 people are called, b) What is the probability that at least two will respond favorably to this sales pitch? = 1 – 0.2588 – 0.3595 = 0.3817

Formulas for mean and standard deviation of a binomial distribution

Quiz Example A professor routinely gives quizzes containing 50 multiple choice questions with 4 possible answers, only one being correct. Occasionally he just hands the students an answer sheet without giving them the questions and asks them to guess the correct answers. Let x be a random variable defined by x = number of correct answers on such an exam Find the mean and standard deviation for x

Quiz Example - solution The random variable is clearly binomial with n = 50 and p = ¼. The mean and standard deviation of x are

Newborns Continued . . . How many girls would you expect in the next five births at a particular hospital? What is the standard deviation of the number of girls born in the next five births?

Remember, in binomial distributions, trials should be independent. However, when we sample, we typically sample without replacement, which would mean that the trials are not independent. . . In this case, the number of successes observed would not be a binomial distribution but rather hypergeometric distribution. But when the sample size, n, is small and the population size, N, is large, probabilities calculated using binomial distributions and hypergeometric distributions are VERY close! When sampling without replacement if n is at most 5% of N, then the binomial distribution gives a good approximation to the probability distribution of x. The calculation for probabilities in a hypergeometric distribution are even more tedious than the binomial formula!

How is this question different from a binomial distribution? Newborns Revisited . . . Suppose we were not interested in the number of females born out of the next five births, but which birth would result in the first female being born? How is this question different from a binomial distribution?

Properties of Geometric Distributions: There are two mutually exclusive outcomes that result in a success or failure Each trial is independent of the others The probability of success is the same for all trials. A geometric random variable x is defined as x = the number of trials UNTIL the FIRST success is observed ( including the success). x 1 2 3 4 So what are the possible values of x To infinity How far will this go? . . .

Probability Formula for the Geometric Distribution Let p = constant probability that any trial results in a success Where x = 1, 2, 3, …

Suppose that 40% of students who drive to campus at your school or university carry jumper cables. Your car has a dead battery and you don’t have jumper cables, so you decide to stop students as they are headed to the parking lot and ask them whether they have a pair of jumper cables. Let x = the number of students stopped before finding one with a pair of jumper cables Is this a geometric distribution? Yes

Jumper Cables Continued . . . Let x = the number of students stopped before finding one with a pair of jumper cables p = .4 What is the probability that third student stopped will be the first student to have jumper cables? What is the probability that at most three students are stopped before finding one with jumper cables? P(x = 3) = (.6)2(.4) = .144 P(x < 3) = P(1) + P(2) + P(3) = (.6)0(.4) + (.6)1(.4) + (.6)2(.4) = .784

Deposit Example Over a very long period of time, it has been noted that on Fridays 25% of the customers at the drive-in window at the bank make deposits. What is the probability that it takes 4 customers at the drive-in window before the first one makes a deposit?

Deposit Example - Solution This problem is a geometric distribution problem with p = 0.25. Let x = number of customers at the drive-in window before a customer makes a deposit. The desired probability is P(4) = (0.75)4-1(0.25) = 0.1055

Section 7-6 Normal Distributions

Normal Distributions Continuous probability distribution Symmetrical bell-shaped (unimodal) density curve defined by m and s (mean & standard deviation) Area under the curve equals 1 Probability of observing a value in a particular interval is calculated by finding the area under the curve As s increases, the curve flattens & spreads out As s decreases, the curve gets taller and thinner

A B 6 s s Do these two normal curves have the same mean? If so, what is it? Which normal curve has a standard deviation of 3? Which normal curve has a standard deviation of 1? YES B A

Standard Normal Distribution Is a normal distribution with m = 0 and s = 1 It is customary to use the letter z to represent a variable whose distribution is described by the standard normal curve (or z curve).

Using the Table of Standard Normal (z) Curve Areas For any number z*, from -3.89 to 3.89 and rounded to two decimal places, the Appendix Table 2 gives the area under the z curve and to the left of z*. P(z < z*) = P(z < z*) Where the letter z is used to represent a random variable whose distribution is the standard normal distribution. To use the table: Find the correct row and column (see the following example) The number at the intersection of that row and column is the probability

Normal Tables pg. 762

Normal Tables pg. 763

Suppose we are interested in the probability that z. is less than -1 P(z < -1.62) = In the table of areas: Find the row labeled -1.6 Find the column labeled 0.02 Find the intersection of the row and column .0526 … z* .00 .01 .02 -1.7 .0446 .0436 .0427 .0418 -1.6 .0548 .0537 .0526 .0516 -1.5 .0668 .0655 .0643 .0618

Suppose we are interested in the probability that z* is less than 2.31. P(z < 2.31) = .9896 … z* .00 .01 .02 2.2 .9861 .9864 .9868 .9871 2.3 .9893 .9896 .9898 .9901 2.4 .9918 .9920 .9922 .9925

Using the Normal Tables Find P(z < 0.46) Column labeled 0.06 Row labeled 0.4 P(z < 0.46) = 0.6772

Using the Normal Tables Find P(z < -2.74) Column labeled 0.04 P(z < -2.74) = 0.0031 Row labeled -2.7

Suppose we are interested in the probability that z. is greater than 2 P(z > 2.31) = The Table of Areas gives the area to the LEFT of the z*. To find the area to the right, subtract the value in the table from 1 1 - .9896 = .0104 … z* .00 .01 .02 2.2 .9861 .9864 .9868 .9871 2.3 .9893 .9896 .9898 .9901 2.4 .9918 .9920 .9922 .9925

… … Suppose we are interested in finding the z* for the smallest 2%. P(z < z*) = .02 To find z*: Look for the area .0200 in the body of the Table. Follow the row and column back out to read the z-value. Since .0200 doesn’t appear in the body of the Table, use the value closest to it. z* = -2.05 z* … z* .03 .04 .05 -2.1 .0162 .0158 .0154 -2.0 .0207 .0202 .0197 -1.9 .0262 .0256 .0250 …

Sample Calculations Using the Standard Normal Distribution Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfy each of the following: P(z < 1.83) = 0.9664 (b) P(z > 1.83) = 1 – P(z < 1.83) = 1 – 0.9664 = 0.0336

Sample Calculations Using the Standard Normal Distribution Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfies each of the following: (d) P(z > -1.83) = 1 – P(z < -1.83) = 1 – 0.0336= 0.9664 c) P(z < -1.83) = 0.0336

Symmetry Property Notice from the preceding examples it becomes obvious that P(z > z*) = P(z < -z*) P(z > -2.18) = P(z < 2.18) = 0.9854

… … Suppose we are interested in finding the z* for the largest 5%. P(z > z*) = .05 Since .9500 is exactly between .9495 and .9505, we can average the z* for each of these .95 z* = 1.645 z* Remember the Table of Areas gives the area to the LEFT of z*. 1 – (area to the right of z*) Then look up this value in the body of the table. … z* .03 .04 .05 1.5 .9382 .9398 .9406 1.6 .9495 .9505 .9515 1.7 .9591 .9599 .9608 …

Method of Probability Calculation The probability that a continuous random variable x lies between a lower limit a and an upper limit b is P(a < x < b) = (cumulative area to the left of b) – (cumulative area to the left of a) = P(x < b) – P(x < a) = - a b

Sample Calculations Using the Standard Normal Distribution Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfies -1.37 < z < 2.34, that is find P(-1.37 < z < 2.34). P(Z<2.34)=0.9904 P(-1.37 < z < 2.34) = 0.9904 = 0.9051 P(Z<-1.37)=0.0853 - 0.0853

Sample Calculations Using the Standard Normal Distribution Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfies 0.54 < z < 1.61, that is find P(0.54 < z < 1.61). P(0.54 < z < 1.61) = 0.9463 P(Z<1.61)=0.9463 = 0.2409 P(Z<.54)=0.7054 - 0.7054

Sample Calculations Using the Standard Normal Distribution Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfy -1.42 < z < -0.93, that is find P(-1.42 < z < -0.93). P(Z<-0.93)=0.1762 P(-1.42 < z < -0.93) = 0.1762 = 0.0984 P(Z<-1.42)=0.0778 - 0.0778

Example Calculation Using the standard normal tables, in each of the following, find the z values that satisfy : (a) The point z with 98% of the observations falling below it. The closest entry in the table to 0.9800 is 0.9798 corresponding to a z value of 2.05

Example Calculation Using the standard normal tables, in each of the following, find the z values that satisfy : (b) The point z with 90% of the observations falling above it. The closest entry in the table to 0.1000 is 0.1003 corresponding to a z value of -1.28

Finding Probabilities for Other Normal Curves To find the probabilities for other normal curves, standardize the relevant values and then use the table of z areas. If x is a random variable whose behavior is described by a normal distribution with mean m and standard deviation s , then P(x < b) = P(z < b*) P(x > a) = P(z > a*) P(a < x < b) = P(a* < z < b*) Where z is a variable whose distribution is standard normal and

Standard Normal Distribution Revisited If a variable X has a normal distribution with mean  and standard deviation , then the standardized variable has the normal distribution with mean 0 and standard deviation 1. This is called the standard normal distribution.

Data on the length of time to complete registration for classes using an online registration system suggest that the distribution of the variable x = time to register for students at a particular university can well be approximated by a normal distribution with mean m = 12 minutes and standard deviation s = 2 minutes.

Registration Problem Continued . . . x = time to register m = 12 minutes and s = 2 minutes What is the probability that it will take a randomly selected student less than 9 minutes to complete registration? Standardized this value. Look this value up in the table. P(x < 9) = .0668 9

Registration Problem Continued . . . x = time to register m = 12 minutes and s = 2 minutes What is the probability that it will take a randomly selected student more than 13 minutes to complete registration? Standardized this value. Look this value up in the table and subtract from 1. P(x > 13) = 1 - .6915 = .3085 13

Registration Problem Continued . . . x = time to register m = 12 minutes and s = 2 minutes What is the probability that it will take a randomly selected student between 7 and 15 minutes to complete registration? Standardized these values. Look these values up in the table and subtract (value for a*) – (value for b*) P(7 < x < 15) = .9332 - .0062 = .9270 7 15

Registration Problem Continued . . . x = time to register m = 12 minutes and s = 2 minutes Because some students do not log off properly, the university would like to log off students automatically after some time has elapsed. It is decided to select this time so that only 1% of students will be automatically logged off while still trying to register. What time should the automatic log off be set at? Look up the area to the left of a* in the table. Use the formula for standardizing to find x. P(x > a*) = .01 a* = 16.66 .99 .01 a*

Hot Sauce Example A company produces “20 ounce” jars of a picante sauce. The true amounts of sauce in the jars of this brand sauce follow a normal distribution. Suppose the company’s “20 ounce” jars follow a normal distribution curve with a true mean  = 20.2 ounces with a true standard deviation  = 0.125 ounces.

Hot Sauce Example Continued What proportion of the jars are under-filled (i.e., have less than 20 ounces of sauce)? Looking up the z value -1.60 on the standard normal table we find the value 0.0548. The proportion of the sauce jars that are under-filled is 0.0548 (i.e., 5.48% of the jars contain less than 20 ounces of sauce.

Hot Sauce Example Continued What proportion of the sauce jars contain between 20 and 20.3 ounces of sauce? The resulting difference 0.7881-0.0548 = 0.7333 is the proportion of the jars that contain between 20 and 20.3 ounces of sauce. Looking up the z values of -1.60 and 0.80 we find the areas (proportions) 0.0548 and 0.7881

Hot Sauce Example Continued 99% of the jars of this brand of picante sauce will contain more than what amount of sauce? When we try to look up 0.0100 in the body of the table we do not find this value. We do find the following The entry closest to 0.0100 is 0.0099 corresponding to the z value -2.33 Since the relationship between the real scale (x) and the z scale is z=(x-)/ we solve for x getting x =  + z x=20.2+(-2.33)(0.125)= 19.90875=19.91

Cereal Example The weight of the cereal in a box is a normally distributed random variable with mean 12.15 ounces, and standard deviation 0.2 ounce. What percentage of the boxes have contents that weigh under 12 ounces?

Printer Example The time it takes to first encounter the failure of a unit of a brand of ink jet printer is approximately normally distributed with a mean of 1,500 hours and a standard deviation of 225 hours. a) What proportion of these printers will fail before 1,200 hours?

Printer Example Continued b) What proportion of these printers will not fail within the first 2,000 hours?

Checking for Normality and Normalizing Transformations Section 7.7 Checking for Normality and Normalizing Transformations

Ways to Assess Normality What should happen if our data set is normally distributed? Some of the most frequently used statistical methods are valid only when x1, x2, …, xn has come from a population distribution that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data. A normal probability plot is a scatterplot of (normal score, observed values) pairs.

Consider a random sample with n = 5. To find the appropriate normal scores for a sample of size 5, divide the standard normal curve into 5 equal-area regions. Each region has an area equal to 0.2. Why are these regions not the same width? Each region should have 20% of the area in them – the higher the curve the less width needed for 20%.

Consider a random sample with n = 5. Next – find the median z-score for each region. These are the normal scores that we would plot our data against. Why is the median not in the “middle” of each region? We use technology (calculators or statistical software) to compute these normal scores. The median should divide the area into half. In this case, the first median would be at the 10th percentile, the second at the 30th percentile, etc. 1.28 -1.28 -.524 .524

Ways to Assess Normality Such as curvature which would indicate skewness in the data Some of the most frequently used statistical methods are valid only when x1, x2, …, xn has come from a population distribution that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data. A normal probability plot is a scatterplot of (normal score, observed values) pairs. A strong linear pattern in a normal probability plot suggests that population normality is plausible. On the other hand, systematic departure from a straight-line pattern indicates that it is not reasonable to assume that the population distribution is normal. Or outliers

Sketch a scatterplot by pairing the smallest normal score with the smallest observation from the data set & so on The following data represent egg weights (in grams) for a sample of 10 eggs. 53.04 53.50 52.53 53.00 53.07 52.86 52.66 53.23 53.26 53.16 Let’s construct a normal probability plot. Since the values of the normal scores depend on the sample size n, the normal scores when n = 10 are below: -1.539 -1.001 -0.656 -0.376 -0.123 0.123 0.376 0.656 1.001 1.539 Since the normal probability plot is approximately linear, it is plausible that the distribution of egg weights is approximately normal.

Using the Correlation Coefficient to Assess Normality The correlation coefficient, r, can be calculated for the n (normal score, observed value) pairs. If r is too much smaller than 1, then normality of the underlying distribution is questionable. Consider these points from the weight of eggs data: (-1.539, 52.53) (-1.001, 52.66) (-.656,52.86) (-.376,53.00) (-.123, 53.04) (.123,53.07) (.376,53.16) (.656,53.23) (1.001,53.26) (1.539,53.50) Calculate the correlation coefficient for these points. r = .986 Since r > critical r, then it is plausible that the sample of egg weights came from a distribution that was approximately normal. Values to Which r Can be Compared to Check for Normality n 5 10 15 20 25 30 40 50 60 75 Critical r .832 .880 .911 .929 .941 .949 .960 .966 .971 .976 How smaller is “too much smaller than 1”?

Shut-In Example Ten randomly selected shut-ins were each asked to list how many hours of television they watched per week. The results are 82 66 90 84 75 88 80 94 110 91 Minitab obtained the normal probability plot on the following slide.

Normal Probability Plot Example Notice that the points all fall nearly on a line so it is reasonable to assume that the population of hours of tv watched by shut-ins is normally distributed.

Phone Solicitation Example A sample of times of 15 telephone solicitation calls (in seconds) was obtained and is given below. 5 10 7 12 35 65 145 14 3 220 11 6 85 6 16 Minitab obtained the normal probability plot on the next slide

Normal Probability Plot Example Clearly the points do not fall nearly on a line. Specifically the pattern has a distinct nonlinear appearance. It is not reasonable to assume that the population of the length of calls is normally distributed. One would most assuredly say that the distribution of lengths of calls is not normal.

Transforming Data to Achieve Normality When the data is not normal, it is common to use a transformation of the data. For data that shows strong positive skewness (long upper tail), a logarithmic transformation is usually applied. Square root, cube root, and other transformations can also be applied to the data to determine which transformation best normalizes the data.

Consider the data set in Table 7 Consider the data set in Table 7.4 (page 463) about plasma and urinary AGT levels. A histogram of the urinary AGT levels is strongly positively skewed. A logarithmic transformation is applied to the data. The histogram of the log urinary AGT levels is more symmetrical.

Using the Normal Distribution to Approximate a Discrete Distribution Section 7.8 Using the Normal Distribution to Approximate a Discrete Distribution

Using the Normal Distribution to Approximate a Discrete Distribution Suppose this bar is centered at x = 6. The bar actually begins at 5.5 and ends at 6.5. These endpoints will be used in calculations. This is called a continuity correction. Suppose the probability distribution of a discrete random variable x is displayed in the histogram below. Often, a probability histogram can be well approximated by a normal curve. If so, it is customary to say that x has an approximately normal distribution. The probability of a particular value is the area of the rectangle centered at that value. 6

Normal Approximation to a Binomial Distribution Let x be a random variable based on n trials and success probability p, so that: If n and p are such that: np > 10 and n (1 – p) > 10 then x has an approximately normal distribution.

Can the distribution of x be approximated by a normal distribution? Premature babies are born before 37 weeks, and those born before 34 weeks are most at risk. A study reported that 2% of births in the United States occur before 34 weeks. Suppose that 1000 births are randomly selected and that the number of these births that occurred prior to 34 weeks, x, is to be determined. np = 1000(.02) = 20 > 10 n(1 – p) = 1000(.98) = 980 > 10 Find the mean and standard deviation for the approximated normal distribution. Since both are greater than 10, the distribution of x can be approximated by a normal distribution Can the distribution of x be approximated by a normal distribution?

Premature Babies Continued . . . m = 20 and s = 4.427 What is the probability that the number of babies in the sample of 1000 born prior to 34 weeks will be between 10 and 25 (inclusive)? P(10 < x < 25) = Look up these values in the table and subtract the probabilities. To find the shaded area, standardize the endpoints. .8925 - .0089 = .8836