Continuous Distributions

Continuous Distributions

Normal Distribution

Normal Distribution n(x; m, s) =
Curve is completely specified by two parameters: mean m and standard deviation s. The area under the curve must stay = 1, so the larger s, the lower the curve. maximum occurs at m curve is symmetric around m inflections occur at m±s curve approaches horizontal axis asymptotically Reasonable approximation for many real-life experimental outcomes. heights, weights measurement errors reaction times Often good approximation to binomial and hypergeometric distributions (if p is not close to 0 or 1 and n is large  np and nq > 5). Good approximation to Poisson if m > 30.

Normal Probability

Normal Distribution Probability that X is between a and b is given by
In practice this is never calculated. Normal curve areas are tabulated, Table A3. In order to have ONE table for all m and s, transform X to Z: or x = sz + m, going back Z is a random variable with m = 0 and s2 = 1. This transforms the integral to: standard normal distribution -4 -2 2 4 Z point Standard Normal (z) curve Shaded Area = F (z) cumulative probability, P(Z  z) = (z)

Standard Normal Table P(Z1.25) = (1.25) = 0.894
-4 -2 2 4 P(0.25Z1.5) = P(Z1.5) – P(Z0.25) = (1.5) - (0.25) = – = 0.334 A standard normal table tabulates the probability of a random variable, Z, being less than or equal to some stated value These values have been calculated using the probability density function for the standard normal given as shown on slide #5 The probability is found from the chart by finding the intersection of the row corresponding to the first two digits of the stated value (shown in the leftmost column) and the column corresponding to the final digit of the stated value (shown in the topmost row). If the probability of a random variable being between two given values is of interest, the probability relating to the smaller value is subtracted from the probability related to the larger value. 0.25 1.5

Standard Normal Table P(Z1.25) = 1 - P(Z1.25) = 1 - 0.894 = 0.106
-4 -2 2 4 1.25 Since Symmetric, P(Z -1.0) = 1 – P(Z1.0) = 1 – = 0.159 Since the area under any probability density function, including the standard normal, is equal to one; AND since the normal is a symmetric distribution, the probability of a random variable being greater than a stated value is equal to 1 minus the probability of the random variable being less than a stated value. Also since the standard normal probability density function is a symmetric distribution (and since most standard normal tables do not tabulate negative values), the probability of a random variable being less than a negative stated value is equal to the probability of the random variable being greater than the positive root of the negative stated value. -4 -1.0 -2 2 4 1.0

Normal Distribution Example 6.13 Walpole
Average motor life m = 10 years with s = 2 years, lives follow normal distribution. Motors that fail within the warrantee period are replaced free. If you want to replace only 3% of the motors, how long should the warrantee be? Go to table and find 3%. This corresponds to z = -1.88 x = sz + m = 2(-1.88) + 10 = = 6.24 years  7 year warrantee

Z Notation Often we want to know at what point the tail area to the right is equal to a, this is denoted by za P(Z  z) = = 1 – P(Z  z) So P(Z  z) = 0.975 and z0.025 = 1.96 Z0.025 = 1.96 Shaded area = P(ZZ0.025) = 0.025 Interpretation: The probability of a random variable N(0,1) having a value greater than 1.96 is 2.5%

Normal Distribution 5th percentile = 95th percentile = 1.645 25th percentile = 75th percentile = 0.675 50th percentile = 0.00 Sometimes questions are asked in terms of percentiles. The percentiles are the points at which p% of the distribution is less than the point.

Normal Probability Plot
Example: 10 independent measurement errors The Normal Probability Plot suggests that the standard normal distribution is a reasonable probability model for measurement error The aspect of interest in a probability plot is to take note of whether or not the data follow a somewhat straight line. If the data follow a somewhat straight line against the distribution being checked, the distributional parameters can be estimated by the straight line. In the case of a normal distribution, the intercept is an estimate of the mean and the slope is an estimate of the variance.

Normal Approximation to Binomial
Normal with m = np and s2 = np(1-p) is a good approximation for the binomial, even if n is small if p is close to ½ This approximation is useful when n is large. (Note that binomial table A1 only goes to n = 20). The normal approximation to the binomial, if np>=5 and nq>=5

Example, Problem 6.36 Walpole Airlines sell more tickets than they have seats for a flight because 2% of customers who buy tickets do not show up for the flight. For a flight with 197 seats, 200 tickets were sold. What is the probability that the airline overbooked this flight? The procedure for getting the answer once you have set up the problem is simple, but this example illustrates why it’s important to set up the problem carefully. This problem provides the number of no-shows: P(no show) = 0.02 Should you use n = 200 or n = 197? Use n = 200 because they sold 200 tickets, and the no-show percentage pertains to the number of people who buy tickets (not the number of seats). One needs to figure out what is being asked: P(x > 3) or P (x < 3) ? The flight will be overbooked if there are fewer than 3 no-shows, so it’s the latter. I.e., x = 0, 1, or 2.

Example, Problem 6.36 Walpole Airlines sell more tickets than they have seats for a flight because 2% of customers who buy tickets do not show up for the flight. For a flight with 197 seats, 200 tickets were sold. What is the probability that the airline overbooked this flight? P(no show) = 0.02, n = 200 m = np = 4 s2 = npq = 0.02*200*0.98 = 3.92  s = 1.98 z = (x - m) / s = (2-4)/1.98 = -1.01 P(z < -1.01) = , or a nearly 16% chance of an overbooked flight. This was done without the continuity correction. Was this a good idea?

= F3-F2 = NORMSDIST(E2) = (A2-4)/1.98 = BINOMDIST(A2,200,0.02,FALSE)

=(A )/1.98

Example, Problem 6.36 Walpole Airlines sell more tickets than they have seats for a flight because 2% of customers who buy tickets do not show up for the flight. For a flight with 197 seats, 200 tickets were sold. What is the probability that the airline overbooked this flight? P(no show) = 0.02, n = 200 m = np = 4 s2 = npq = 0.02*200*0.98 = 3.92  s = 1.98 With the continuity correction: z = (x - m) / s = ( )/1.98 = P(z < -0.76) = , or a 22% chance of an overbooked flight.  When np or n(1-p) is small (< ~15), use the continuity correction. 

How likely is this number of fours? n = 47; x = 14; p = 0.205 # Heads # Times Binomial Binomdist Expected 1 2 5 3 4 14 7 6 10 8 9 Total 47 m = np = 47* = 9.639 s2 = npq = 9.639* = 7.662 s = 2.768 z = (x )/2.768

P (13.5 < x < 14.5) = ? z1 = ( )/2.768 = 4.861/2.768 = 1.756 z2 = ( )/2.768 = 3.861/2.768 = 1.395 P (1.39 < z < 1.76) = – = This compares with P = using the binomial.

Chi-Squared Distribution

Chi Squared If X has a normal distribution with mean m and variance s2, then the variable Y = X2 has a chi-squared distribution with v = 1 degree of freedom. Corollary 7.1 If X1, X2, ... Xn are independent, random variables having normal distributions with means m1, m2, ... mn and variances s12, s22, ...sn2, , then the random variable Y = 1. also has a chi-squared distribution 2. with v = n degrees of freedom

chi squared distributions
f(x;v) = A chi squared random variable is a continuous random variable that results from the squares or sum of squares of independent standard normal random variables. The chi squared distribution is specified by the single parameter v, where v is called the degrees of freedom. chi squared distributions Images thanks to Ewa Paszek ,

Chi Squared Distribution
The chi-squared distribution is one of the most widely used distributions in statistical tests. The mean and variance of the chi-squared distribution are m = v and s2 = 2v. Values of are tabulated. is the c2 value above which there is an area of a. Small values of a indicate unlikely events.

unlikely If n = 5, v = 4 degrees of freedom

Gamma & Exponential Distributions

Gamma Distribution f(x) = a, b, x > 0 where G() = m = ab s2 = ab2
Analogous to the negative binomial. The time/space required for a specified number of Poisson events to occur a = # events b = 1/(the average number of events per unit time/area) = the mean time between events nth event? Bernoulli process negative binomial Poisson event gamma Study example 6.18

Gamma Distribution Gamma Poisson f(x) =  compare  BUT x = time
a = # of events b = the mean time to/between events x = number of occurrences lt = average number of occurrences

Gamma Distribution f(x;a,b) = a, b, x > 0
Values are tabulated (Table A24), but as the incomplete gamma function, which is the cumulative function. So, integrate f(x;a,b) To convert to the incomplete gamma function, set x = by (y = x/b). This scales everything so that the function now only depends on a. Note that we still have x in the integral. If we want to know a survival time to x’, then we convert to x = x’/b and integrate y to that. Study example 6.19. Note that x is usually a time.

Exponential Distribution
m = b s2 = b2 Gamma with a = 1. f(x;b) = b, x > 0 The time required for the first Poisson event to occur. Note that b = 1/l = mean time between events (failures) Example 6.17 in Walpole Component failure is described by exponential distribution with mean time to failure b = 5. What is the probability that at least 2 of these are still functioning after 8 years? First, find P(X > 8), using the exponential distribution. P(X > 8) = Then find the probability that 2 of the 5 are functioning, for which you go back to the binomial. P(X > 2) =

Exponential Distribution
The exponential assumes that the system has no memory. P(X > t) = P(X > t0 + t | X > t0) In other words, it does not take into account wear. For that, you need the gamma or the Weibull. These two are often curve-fitting distributions, with the basic physics not considered. Note the wording in example it is known from previous data that the number of complaints can be described by a gamma distribution with a = 2 and b = 4. Use the exponential for things like telephone calls, car arrivals, etc. where this assumption holds, or for failures that are due to random events and not wear.

Gamma and Exponential Distributions
Time (space) until a specified number of Poisson events occur. Time (space) until the first of Poisson event occurs (or time between events)

Other Distributions

also called a stretched exponential
Other Distributions Chi-Squared We will return to this later. Lognormal If the random variable Y has a normal distribution, and if Y = ln(X), then X has a lognormal distribution. Find statistics by converting to/from the normal distribution. Weibull Also used to describe times to failure. F(x) = also called a stretched exponential

Continuous Distributions

Similar presentations

Presentation on theme: "Continuous Distributions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Continuous Distributions

Similar presentations

Presentation on theme: "Continuous Distributions"— Presentation transcript:

Similar presentations

About project

Feedback