Probability Distributions 2014/04/07 Maiko Narahara
Probability density function (PDF) A function that defines probabilities of continuous variables Because a continuous variable is continuous, the probability of observing the exact value is almost zero. So, we read the area of a certain range as a probability of observing any value in the range. --> Area under the curve must be 1.
Probability mass function (PMF) A function that defines probabilities for discrete variables. Unlike continuous variables, the probability of observing exact value is defined (y-axis = probability). The sum of y values for all possible x values must be 1.
R functions for probability distributions [rdpq]name_of_distribution() – r: random generation generate random numbers from distribution – d: density distribution function returns density for given value – p: cumulative distribution function returns cumulative probability or used when we calculate p value – q: quantile function returns values that correspond to given quantiles
rnorm n < x <- rnorm(n, mean=0, sd=1) hist(x)
dnorm dnorm(x=1, mean=0, sd=1) gives the density that corresponds to the given value x. Density X
pnorm pnorm(q=0, mean=0, sd=1) gives the cumulative probability for the given value of x How to compute p value Z-test statistic: 2.5 pnorm(2.5, lower.tail=FALSE) *note: one-tail test Cumulative probability X
qnorm qnorm(0.975) returns x that corresponds to the given quantile value. This example calculates the upper critical value at alpha=0.05 (two-tail). Cumulative probability X
Tips 1 Handling vectors rnorm(10, mean=1:10, sd=1:10) rnorm(5, mean=c(1, 1, 2, 2, 2)) – # sampling from different distributions dnorm(0, mean=1:2) dnorm(c(0, 1), mean=1:2) – # similarly, qnorm and pnorm can handle vectors
Tips 2 Drawing curve of d/p function Syntax: curve(function, from, to) curve(dnorm, from=-3, to=3) – # draws a nice curve for the standard normal distribution, But if you want to change the parameters for the distribution, how to do that? curve(dnorm, mean=1, sd=2) # does not work a <- function(x) dnorm(x, mean=1, sd=2) curve(a, from=-3, to=5) Similarly, you can draw a cumulative curve curve(pnorm, from=-3, to=3)
Note about lower.tail=FALSE for discrete distributions pbinom(1, 5, prob=0.3) --> > include the probability of x=1 pbinom(1, 5, prob=0.3, lower.tail=FALSE) --> > does not include x=1 Note that setting lower.tail=FALSE equals 1 - pbinom(1, 5, prob=0.3)