Lecture 3: Distribution of random variables

Slides:



Advertisements
Similar presentations
Probability Distribution
Advertisements

NORMAL OR GAUSSIAN DISTRIBUTION Chapter 5. General Normal Distribution Two parameter distribution with a pdf given by:
The Normal Distribution
Special random variables Chapter 5 Some discrete or continuous probability distributions.
Chapter 6 Continuous Random Variables and Probability Distributions
Distributions of sampling statistics Chapter 6 Sample mean & sample variance.
Central Limit Theorem. So far, we have been working on discrete and continuous random variables. But most of the time, we deal with ONE random variable.
Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.
Statistics S2 Year 13 Mathematics. 17/04/2015 Unit 1 – The Normal Distribution The normal distribution is one of the most important distributions in statistics.
Sampling Distributions (§ )
Modeling Process Quality
Continuous Random Variables and Probability Distributions
Chapter 6 Continuous Random Variables and Probability Distributions
Section 10.6 Recall from calculus: lim= lim= lim= x  y  — x x — x kx k 1 + — y y eekek (Let y = kx in the previous limit.) ekek If derivatives.
Some standard univariate probability distributions
Continuous Random Variables and Probability Distributions
Chapter 5 Continuous Random Variables and Probability Distributions
5-1 Business Statistics Chapter 5 Discrete Distributions.
Modern Navigation Thomas Herring
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Chapter 4 Continuous Random Variables and Probability Distributions
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
The Binomial Distribution. Binomial Experiment.
Poisson Random Variable Provides model for data that represent the number of occurrences of a specified event in a given unit of time X represents the.
Day 2 Review Chapters 5 – 7 Probability, Random Variables, Sampling Distributions.
Lecture 15: Statistics and Their Distributions, Central Limit Theorem
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Continuous Random Variables.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
© 2002 Thomson / South-Western Slide 5-1 Chapter 5 Discrete Probability Distributions.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Continuous Random Variables and Probability Distributions
Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a.
Chap 5-1 Chapter 5 Discrete Random Variables and Probability Distributions Statistics for Business and Economics 6 th Edition.
 A probability function - function when probability values are assigned to all possible numerical values of a random variable (X).  Individual probability.
DATA ANALYSIS AND MODEL BUILDING LECTURE 4 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
The Galton board The Galton board (or Quincunx) was devised by Sir Francis Galton to physically demonstrate the relationship between the binomial and normal.
Statistics -Continuous probability distribution 2013/11/18.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 5: Linear Algebra.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference.
Lecture 5: Linear Algebra
Sampling and Sampling Distributions
MECH 373 Instrumentation and Measurements
The Recursive Property of the Binomial Distribution
Lecture 3: Distribution of random variables
Lecture 4: Statistical inference
Lecture 3: Distribution of random variables
Chapter 4: Sampling and Statistical Inference
Keller: Stats for Mgmt & Econ, 7th Ed
Lecture 3: Distribution of random variables
Ten things about Probability
Lecture 5: Linear Algebra
Chapter 7: Sampling Distributions
Lecture 5: Linear Algebra
Probability Review for Financial Engineers
Probability Theory and Specific Distributions (Moore Ch5 and Guan Ch6)
If the question asks: “Find the probability if...”
Lecture 3: Distribution of random variables
Each Distribution for Random Variables Has:
Chapter 5 Continuous Random Variables and Probability Distributions
Presentation transcript:

Lecture 3: Distribution of random variables Statistical Genomics Lecture 3: Distribution of random variables Zhiwu Zhang Washington State University

Outline Distributions: binomial, normal, X2, t, and F Relationship Characteristics (mean, var, range, and symmetry)

Galton Board

Binomial distribution A single event has successful rate of p. Repeat the event n times. The total number of success is a random variable, x Range from zero to n. The probability is c(n, x)px(1-p)(n-x) ,where c(n, x) is number of combinations of choosing x from n. Notation: B(n, p)

Binomial distribution Mean=np Var=np(1-p) >=0 Symmetric only if p=.5 When n is large, binomial is close to normal distribution

Binomial distribution and Galton board x~B(n, p) n trials each with p successful rate. The total number of successes is a random variable, x p=0.5, Left-fail Right-success x=rbinom(10000,5,.0) 6 4 2 1 10 5 3 n=1 n=2 n=3 n=4 n=5 2 1 3 5 4 Outcome

Binomial in R p=.5 n=5 #number of layers/trials k=10000 #number of balls x=rbinom(k, n, p) hist(x)

Different probability and trials n=200 #number of layers/trials k=10000 #number of balls x=rbinom(k, n, p) hist(x)

Standardization mean=n*p var=n*p*(1-p) z=(x-mean)/sqrt(var) hist(z)

Plot on density Area sum to one d=density(z) par(mfrow=c(2,1),mar = c(3,4,1,1)) plot(d) polygon(d, col="red", border="blue") Area sum to one

Normal distribution Binomial distribution with large n Bell shape Exponential function Notation: N(mean, var) -infinity to +infinity symmetric

Standard normal distribution Mean of zero and variance of one Notation: N(0,1) Map between deviation and probability 68% of data 95% of data 99.7% of data -3 -2 -1 1 2 3

Normal distribution in R x=rnorm(k, mean=mean,sd=sqrt(var)) hist(x)

Binomial vs. Normal x=rbinom(k, n,p) d=density(x) plot(d) mean=n*p var=n*p*(1-p) x=rnorm(k, mean=mean,sd=sqrt(var)) d=density(x) plot(d)

What is the probability of x=80? Binomial: c(200,80)x.480x.620 Normal distribution: zero

Poisson distribution Special case of binomial distribution: p close to zero and n close to infinity so that λ=np reach constant Mean= Var = λ range >=0

Poisson distribution in R par(mfrow=c(2,2),mar = c(3,4,1,1)) lambda=.5 x=rpois(k, lambda) hist(x) lambda=1 lambda=5 lambda=10

Approximation by binomial par(mfrow=c(3,3),mar = c(3,4,1,1)) k=10000 #number of Gaton boards p=c(.5, .05, .005) n=c(10,100,1000) for (pi in p){ for (ni in n){ x=rbinom(k, ni,pi) hist(x) }} quartz() lambda=5 x=rpois(k, lambda) x=rpois(k, lambda) x=rbinom(k, n, p)

Distribution derived from normal distribution Square x -0.0827 -0.5546 -1.9096 1.0000 -1.0723 -0.0288 -0.9039 0.5654 0.0130 1.0197 … y 0.0068 0.3076 3.6467 1.0000 1.1498 0.0008 0.8171 0.3197 0.0002 1.0398 … Normal distribution ? k=10000 x=rnorm(k,0,1) hist(x) y=x^2 hist(y)

Two normal distribution variables x1 square x2 square 0.0068 0.0351 0.3076 1.2007 3.6467 5.0488 1.0000 0.0052 1.1498 0.7752 0.0008 2.3219 0.8171 0.2415 0.3197 0.1693 0.0002 0.8089 1.0398 0.0044 … x1 x2 -0.0827 -0.1874 -0.5546 -1.0958 -1.9096 -2.2470 1.0000 -0.0720 -1.0723 -0.8805 -0.0288 -1.5238 -0.9039 -0.4914 0.5654 0.4114 0.0130 -0.8994 1.0197 -0.0666 … y=sum 0.0420 1.5083 8.6955 1.0052 1.9251 2.3227 1.0586 0.4890 0.8091 1.0442 … k=10000 x1=rnorm(k,0,1) x2=rnorm(k,0,1) y=x1^2 + x2^2 mean(y) var(y) hist(y) n=2

Chi square (x2) distribution If xi~N(0, 1) , then y=sum(xi2)~X2(n) Mean=n Var=2n range >=0 Non symmetric n=2 n=5 n=2 k=10000 x=rnorm(k*n,0,1) x2=x^2 xm=matrix(x2,k,n) y=rowSums(xm) mean(y) var(y) hist(y) n=100

Chi square distribution in R par(mfrow=c(2,2),mar = c(3,4,1,1)) x=rchisq(k,2) d=density(x) plot(d) x=rchisq(k,5) x=rchisq(k,100) x=rchisq(k,1000)

F distribution If U~X2(n1), V~X2(n2) F=(U/n1)/ (V/n2) ~ F (n1, n2) Mean=n2/(n2-2) Variance= range >=0 Non symmetric par(mfrow=c(2,2),mar = c(3,4,1,1)) x=rf(k,1, 100) hist(x) x=rf(k,1, 10000) x=rf(k,10, 10000) x=rf(k,10000, 10000)

t distribution If z~N(0,1), V~X2(n) t=z/sqrt(V/n)~ t (n) Sympatric Mean=0 Variance=n/(n-2) range: –infinity to + infinity William Sealy Gosset Known as "Student"

t distribution par(mfrow=c(2,2),mar = c(3,4,1,1)) x=rt(k,2) hist(x)

Relationship between t and F t2=z2/ (U/n)~ F (1,n) par(mfrow=c(2,1),mar = c(3,4,1,1)) x=rf(k,1, 100) hist(x) x=rt(k,100) z=x^2 hist(z)

Central Limit Theory (CLT) Averages of large samples close to normal distribution.

par(mfrow=c(5,1),mar = c(3,4,1,1)) #Binomia p=.05 n=100 #number of balls k=10000 #number of Gaton boards x=rbinom(k, n,p) d=density(x) plot(d,main="Binomial") #Poisson lambda=10 x=rpois(k, lambda) plot(d,main="Poisson") #Chi-Square x=rchisq(k,5) plot(d,main="Chi-square") #F x=rf(k,10, 10000) plot(d,main="F dist") #t x=rt(k,5) plot(d,main="t dist")

Function to get mean of ten i2mean=function(x,n=10){ k=length(x) nobs=k/n xm=matrix(x,nobs,n) y=rowMeans(xm) return (y) }

par(mfrow=c(5,1),mar = c(3,4,1,1)) #Binomia p=.05 n=100 #number of balls k=10000 #number of Gaton boards x=i2mean(rbinom(k, n,p)) d=density(x) plot(d,main="Binomial") #Poisson lambda=10 x=i2mean(rpois(k, lambda)) plot(d,main="Poisson") #Chi-Square x=i2mean(rchisq(k,5)) plot(d,main="Chi-square") #F x=i2mean(rf(k,10, 10000)) plot(d,main="F dist") #t x=i2mean(rt(k,5)) plot(d,main="t dist")

Distribution diagram B(n,p) P(λ) t(n) N(0,1) F(n1,n2) X2(n) λ=np x/X2 sum x^2 over n F(n1,n2) X21/n1 / X22/n2 X2(n)

Distribution features B(n,p) P(λ) N(0,1) X2(n) F(n1,n2) T(n) Mean np λ n n2/(n2-2) Varance np(1-p) 1 2n n/(n-2) Range >=0 >0 (-∞,∞) Symmetry N Y

Highlight Distributions: binomial, normal, X2, t, and F Relationship Characteristics (mean, var, range, and symmetry) CLT