Download presentation
Presentation is loading. Please wait.
Published byGabriel Ryan Modified over 6 years ago
1
BIOS 501 Lecture 3 Binomial and Normal Distribution
Roderick Little
2
The binomial and normal distributions
Density curves Binomial distribution for counts Normal distributions The rule The standard normal distribution Normal distribution calculations Standardizing observations Normal quantile plots IPS Section 1.3 Biostat 501 Lecture 3
3
Inference for a population based on a sample
Statistical inference: the process of making inferences about parameters of a population based on sample data The distribution of values of X in the population (assumed large) is called the sampling distribution of X. Two important sampling distribution are the binomial distribution and the normal distribution. Sample Mean SD s Population Mean SD Biostat 501 Lecture 3
4
Sampling distributions
A random variable is a variable whose value is a numerical outcome of a random phenomenon. Outcomes can be made into random variables by coding outcomes as numerical values; e.g. in coin tossing define the random variable X = 1 for a head, X = 0 for a tail. Then the mean of X is the proportion of heads. A discrete random variable X has a finite (or countable) number of possible values. The probability distribution of X lists the values and their probabilities. A continuous random variable X takes all the values in an interval of numbers. The probability distribution of X is described by a probability density function (pdf). The probability of any event is the area under the pdf for the set of values of X that make up the event. Biostat 501 Lecture 3
5
Binomial Distribution
Consider a discrete random variable with just two outcomes (S,F) and Pr(S)=1-Pr(F)=p. Let the random variable x be the number of S’s in n independent trials. The sample space of outcomes is S = {0,1,2,…,n}. x follows a Binomial sampling distribution, and we write x~ Bin(n, p) Examples: x = number of heads in 15 tosses of a fair coin x~Bin(15,0.5) crossover trial with 30 subjects, x = number of subjects for which new treatment A is better than control treatment. x~Bin(30, p), p = Pr(A better than B) Biostat 501 Lecture 3
6
Two binomial distributions
.3 .2 .1 x Pr(x) .3 .2 .1 x Pr(x) n=6, p=0.5 n=6, p=0.7 Biostat 501 Lecture 3
7
Probability density function for a continuous
random variable As sample size n increases, histogram gets closer and closer to the density curve Biostat 501 Lecture 3
8
Probability density function
As sample size increases, histogram tends to a probability density function (density curve), reflecting the distribution in the population. The density curve lies on or above the horizontal axis. Has area exactly 1 underneath it Area under this curve for a given range is the probability of a random observation falling in that range. Biostat 501 Lecture 3
9
Normal distribution Good description of many real continuous variables (test scores, crop yields, height) Symmetric, unimodal, bell-shaped Characterized by mean μ and s.d. σ . Mean= median is the center s.d. measures spread Approximates many other distributions well Biostat 501 Lecture 3
10
Normal distributions Biostat 501 Lecture 3
11
Normal distribution We write X~N(µ,σ) if X follows a normal distribution with mean µ, standard deviation σ N(0,1) is called the standard normal distribution (mean 0, sd 1) Formula for density (not important in this class): Biostat 501 Lecture 3
12
The 68-95-99.7 rule for normal distribution / data
Approximately 68% of the observations fall within σ of the mean μ. Approximately 95% of the observations fall within 2σ of the mean μ. Approximately 99.7% of the observations fall within 3σ of the mean μ. Suppose mean is 0 and standard deviation =1. Biostat 501 Lecture 3
13
Biostat 501 Lecture 3
14
Example Heights of young women aged 18 to 24. Mean μ = 64.5 s.d σ=2.5
μ +σ=67 , μ- σ=62 :68% in (62,67) μ +2σ=69.5 , μ-2σ=59.5 : 95% in (59.5,69.5) μ +3σ=72 , μ-3σ=57 : 99.7% in (57,72) Biostat 501 Lecture 3
15
Biostat 501 Lecture 3
16
How short are the shortest 2.5% women?
Less than 59.5 inches How tall are the tallest 2.5% women? Taller than 69.5 inches If data are normal, the full distribution is determined by the mean and s.d. How about this question: What percent of women are taller than 61 inches? Need more detailed tables. Biostat 501 Lecture 3
17
Finding probabilities for normal data
Tables for standard normal distribution (N(0,1)) are available (See T-2 and T-3 at the back of the text) We will first learn how to find probabilities for standard normal data. Then learn to find probabilities for a normal distribution with any mean and s.d. Biostat 501 Lecture 3
18
Biostat 501 Lecture 3
19
Biostat 501 Lecture 3
20
Biostat 501 Lecture 3
21
Examples What proportion of observations on a standard normal variable Z take values less than 2.2 ? greater than ? Find the 25th percentile of the N(0,1) curve. We will work on many examples other than the ones posted here. Biostat 501 Lecture 3
22
Standardizing z-score – standardized value of x (how many standard deviations from the mean). Subtract the mean and divide by the standard deviation: Biostat 501 Lecture 3
23
Finding probabilities for normal data
The standardized values for any distribution always have mean 0 and standard deviation 1. If the original distribution is normal, the standardized values have normal distribution with mean 0 and standard deviation 1. This is called the standard normal distribution. general normal: N(µ, σ) standard normal: N(0,1) Biostat 501 Lecture 3
24
Standardizing to find probabilities for normal data
X ~ N(µ,σ); what proportion of population is less than x*? Convert x* to standardized value z* = (x* - µ)/ σ Find P = Pr(Z < z*) from N(0,1) table Biostat 501 Lecture 3
25
Example In Y2K the scores of students taking SATs were approximately normal with mean 1019 and standard deviation 209. What percent of all students had the SAT scores of at least 820? (limit for Division I athletes to compete in their first college year) Biostat 501 Lecture 3
26
Standardizing to find percentiles for normal data
Biostat 501 Lecture 3
27
Example In Y2K the scores of students taking SATs were approximately normal with mean 1019 and standard deviation 209. How high must a student score in order to place in the top 20 % of all students taking the SAT? Ans: the required value x* is the 80th percentile of the distribution Biostat 501 Lecture 3
28
Normal quantile plots Arrange the data from smallest to largest and record corresponding sample percentiles. Find z-scores for these percentiles (for example z-score for 5-th percentile is z=-1.645). Plot each data point (Y) against the corresponding z (X). If the data distribution is close to normal the plotted points will lie close to a straight line. Deviations from a straight line are evidence that the data are not normal. Biostat 501 Lecture 3
29
Newcomb’s data Biostat 501 Lecture 3
30
Newcomb’s data without outliers
Biostat 501 Lecture 3
31
IQ scores of seventh-grade students
Biostat 501 Lecture 3
32
Summary Hypothesized mathematical models for distributions: Probability Density curves Normal Distribution: Empirical rule Evaluating probabilities for standard normal distribution Evaluating probabilities for normal distribution with any mean and s.d. by using standardized Z-scores Normal Probability plot Biostat 501 Lecture 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.