Previous Lecture: Sequence Database Searching. Introduction to Biostatistics and Bioinformatics Distributions This Lecture By Judy Zhong Assistant Professor.

Slides:



Advertisements
Similar presentations
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Advertisements

Chapter 4: Probabilistic features of certain data Distributions Pages
Biostatistics Unit 4 - Probability.
Review of Basic Probability and Statistics
Chapter 5 Basic Probability Distributions
Probability Densities
Review.
Chapter 6 Continuous Random Variables and Probability Distributions
Probability Distributions
Evaluating Hypotheses
CHAPTER 6 Statistical Analysis of Experimental Data
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Continuous Random Variables and Probability Distributions
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Chap 6-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 6 Continuous Random Variables and Probability Distributions Statistics.
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
NIPRL Chapter 2. Random Variables 2.1 Discrete Random Variables 2.2 Continuous Random Variables 2.3 The Expectation of a Random Variable 2.4 The Variance.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Fall 2013Biostat 5110 (Biostatistics 511) Week 7 Discussion Section Lisa Brown Medical Biometry I.
Chapter 4 Continuous Random Variables and Probability Distributions
The Binomial, Poisson, and Normal Distributions Modified after PowerPoint by Carlos J. Rosas-Anderson.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
QA in Finance/ Ch 3 Probability in Finance Probability.
Chapter 7: The Normal Probability Distribution
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Chapter 6: Probability Distributions
Continuous Random Variables and Probability Distributions
1 If we can reduce our desire, then all worries that bother us will disappear.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Probability and Probability Distributions 1Dr. Mohammed Alahmed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Introduction to Biostatistics and Bioinformatics Estimation II This Lecture By Judy Zhong Assistant Professor Division of Biostatistics Department of Population.
Probability The definition – probability of an Event Applies only to the special case when 1.The sample space has a finite no.of outcomes, and 2.Each.
Theory of Probability Statistics for Business and Economics.
QBM117 Business Statistics Probability and Probability Distributions Continuous Probability Distributions 1.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Statistical Applications Binominal and Poisson’s Probability distributions E ( x ) =  =  xf ( x )
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
CHAPTER Discrete Models  G eneral distributions  C lassical: Binomial, Poisson, etc Continuous Models  G eneral distributions 
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
CONTINUOUS RANDOM VARIABLES
Topic 5: Continuous Random Variables and Probability Distributions CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text,
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Chapter 4. Random Variables - 3
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
Continuous Random Variables and Probability Distributions
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Engineering Probability and Statistics - SE-205 -Chap 3 By S. O. Duffuaa.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Chapter 3 Applied Statistics and Probability for Engineers
MECH 373 Instrumentation and Measurements
Chapter 5 Statistical Models in Simulation
Discrete Probability Distributions
Chapter 2. Random Variables
Chapter 5 Continuous Random Variables and Probability Distributions
Presentation transcript:

Previous Lecture: Sequence Database Searching

Introduction to Biostatistics and Bioinformatics Distributions This Lecture By Judy Zhong Assistant Professor Division of Biostatistics Department of Population Health

Introduction  Last lecture defined probability and introduced some basic tools used in working with probabilities  This lecture discusses specific probability models  Three specific probability distributions (models)  Binomial distribution  Poisson distribution  Normal distribution 3

Random variables  A random variable is a function that assigns numeric values to different events in a sample space NOTE: (1) Randomness; (2) Numeric values Example 1: Randomly select a student from a class. X=student’s number of siblings. X could be 0, 1, 2 … Example 2: Randomly select a student from a class. X=student’s height. X could be any value bigger than 0 4

Two types of random variables 1. A random variable for which there exists a discrete set of numeric values is a discrete random variable 2. A random variable whose possible values cannot be enumerated is a continuous random variable 5

Probability distribution function  A probability distribution function is a mathematical relationship, or rule, that assigns to any possible value r of a discrete random variable X the probability Pr(X=r). 6

Expected value (expectation) of a discrete random variable  The expected value (expectation) of a discrete random variable is defined as  Where x_i’s are the values the random variable X assumes with positive probability  The sum is over all the R possible values. R may be finite (e.g., binomial distribution) or infinite (e.g., Poisson distribution)  Expectation represents “average” value of the random variable 7

Variance (population variance) of a discrete random variable  The variance of a discrete random variable is defined by  The standard deviation of a random variable is defined by 8

An experiment (for binomial distribution)  Common structure for binomial distribution: 1. A sample of n independent trials 2. Each trial can have only two possible outcomes, which are denoted as “success” and “failure” (the term “success” is used in a general way, without specific meaning) 3. The probability of a success at each trial is assumed to be the same, with probability p (hence the probability of failure is 1-p=q) 4. Let random variable: X=number of successes among n trails 9

How to fit a real problem into binomial structure  Here we concentrate on counting the number of neutrophils of 5 white blood cells.  Assume that the probability that a cell is neutrophils is number of trials n=5 2. “success”=“one cell being neutrophils” 3. Pr(“success”)=p= X=number of successes among 5 10

How to calculate the probability of an outcome from binomial structure  There are 5 white cells, each of cell is either neutrophils (N) or other (O). What is the probability that the 2 nd and the 5 th cells considered will be neutrophils and the remaining cells are non-neutrophils? That is, what is the probability of outcome “ONOON”  Assume that the outcomes for different cells are independent. Using multiplication law of probability,  Think about this question: What is the probability that any 2 cells out of 5 will be neutrophils? 11

Combination plays an role …  Possible outcomes for 2 neutophils of 5 cells: NNOOO, ONNOO, …  How many such outcomes?  Then the probability of obtaining 2 neutrophils in 5 cells is: 12

Binomial distribution  Let X=number of success in n statistically independent trials, where the probability of success is p  The distribution of random variable X is known as the binomial distribution and has probability distribution function given by 13

Using binomial tables  Table 1 in the Appendix: for n=2, 3, …, 20 and p=0.05, 0.10, …,

Expected value and variance of the binomial distribution  Result: The expected value and the variance of a binomial distribution are np and np(1-p), respectively 15

Bernoulli distribution 16  Look at a special case of binomial random variable with n=1 and p. That is, conduct only one trial, X=1 if success and X=0 if failure: o Pr(X = 1) =p o Pr(X = 0) = 1 − p = q  Expectation of X: E(X)=1*p+0*q=p  Variance of X: Var(X)=(1^2*p+0^2*q)-p^2=p*(1-p)=pq

Write binomial random variable in terms of bernoulli random variables  Conduct n independent trials, each trail having outcome either success or failure  For each trail, probability of success is p  X=number of successes among n trials. It is known that the distribution of X is binomial distribution with n and p  Now define the outcome of the ith trial as Xi (Xi=1 if success and Xi=0 if failure), then 17

Proof of expectation and variable of binomial variable  Fact 1:  Fact 2: For any i, E(Xi)=p and Var(Xi)=pq  Then (1), where the first equality always holds (2), where the first equality only holds for independent variables 18

Poisson distribution for rare events  The Poisson distribution is the second most frequently used discrete distribution after the binomial distribution. Poisson distribution is usually associated with rare events (for example, rare diseases) 19

Examples  number of deaths attributed to typhoid fever over a year  Assuming the probability of a few death from typhoid fever in any one day is vey small and the number of cases reported in any two days are independent random variables, then the number of deaths over a 1-year period will follow a Poisson distribution  number of bacterial colonies growing on an agar plate.  Suppose we have a 100-cm^2 agar plate. The probability of finding any bacterial colonies on a small area is very small, and the events finding bacterial colonies at any two areas are independent. The number of bacterial colonies over the entire agar plate will follow a Poisson distribution 20

Poisson distribution  The probability of k events occurring for a Poisson distribution with parameter  is 21

Use Poisson table (Table 2 in the Appendix)  For  =0.5, 1.0, 1.5, …,

Expectation and variance of a Poisson random variable  Result: For a Poisson distribution with parameter , the mean and variance are both equal to  23

24 u = 2.5 u = 7.5 u = 15

Binomial when n is large and p is very small o X~bin(n, p) o E (X) = np o Var (X) = np(1-p)=npq o If n is large and p is very small, 1-p = q ≈ 1 o Then np ≈ npq o That is, E (X) ≈ Var (X) 25

Probability that a continuous random variable falls in range [a, b]  For discrete variable, probability distribution gives the probability of each value that the variable takes on. Can we have the same distribution for continuous variable? The answer is: NO  For a continuous DBP, the probabilities of specific blood-pressure measurement values such that are 0, and thus the concept of a probability distribution (probability mass) function cannot be used  Instead, we speak in terms of the probability that blood pressure X falls within a range of values, for examples, ranges 90≤X<100, or a≤X<b 26

Probability density function  The probability density function (pdf) of the random variable X is a function such that the area under the density function curve between any two points a and b is equal to the probability that the random variable X falls between a and b. Thus, the total area under the density function curve over the entire range of possible values for the random variable is 1  The pdf has large values in regions of high probability and small values in regions of low probability 27

Some remarks  As discussed earlier, for a continuous random variable X, Pr(X=x)=0 for any specific value x  Generally, a distinction is not made between probabilities such as Pr(X<x) and Pr(X≤x), Pr(a≤X≤b) and Pr(a<X<b) when X is a continuous  The pdf of a continuous random variable X is usually denoted as f(x)  In mathematics, the probability of X in interval [a, b] is equal to the integration (area) of its pdf over [a,b], that is 28

Expectation and variance  The expectation of a continuous random variable X, denoted by E(X), or , is the average value taken on by the random variable  The variance of a continuous random variable X, denoted by Var(X) or, is the average squared distance of each value of the random variable from its expectation, which is given by. The standard deviation, or , is the square root of the variance, that is, 29

Normal distribution  Normal distribution is also called Gaussian distribution, after the well-known mathematician Karl Gauss ( , “the Prince of Mathematicians“)  Normal distribution is very useful Many variables are normally distributed Many other distributions an be made approximately normal by transformation Normal distribution is as approximation of other distribution such as binomial distribution and Poisson distribution Most statistical methods considered in this text are based on normal distribution 30

The pdf of normal distribution  The normal distribution is defined by its pdf, which is given as for some parameters  and  31  : Mean  : Standard deviation  = e =

An example of Normal pdf  Bell-shaped, symmetric with mode and center at   A point of inflection is a point at which the slope of the curve directions. Image you are skiing on a mountain 32

Location is measured by   In the graph,  2 >  1 33

Spread is measured by σ 2  In the graph,  2 >  1 34

Standard normal distribution N(0, 1)  A normal distribution with mean 0 and variance 1 is called a standard normal distribution. Denoted as N(0, 1)  In the following, we will examine the standard normal distribution N(0, 1) in detail  We will see that any information concerning a general normal distribution N( , σ 2 ) can be obtained from appropriate manipulations of an N(0,1) distribution 35

Density of N(0,1) 36

Properties of the standard normal N(0, 1)  It can be shown that about 68% of the area under the standard normal density lies between -1 and +1, about 95% of the area lies between -2 and +2, and about 99% lies between -2.5 and +2.5 NOTE: You will see that, more precisely, Pr(-1<x<1)=0.6827, Pr(-1.96<X<1.96)=0.95, Pr(-2.576<X<2.576 )=

Some notations  The cumulative distribution function (cdf) for a standard normal distribution is denoted by  (x)=Pr(X≤x), where X~N(0,1)  The symbol ~ is used as shorthand for the phase “is distributed as.” Thus X~N(0,1) means that the random variable X is distributed as an N(0,1) distribution  Generally, X~N( , σ 2 ) means X is distributed as N( , σ 2 ) 38

Normal table: Table 3 in Appendix 39

Using symmetry properties of N(0,1)  From the symmetry property of the N(0,1),  (-x)=Pr(X≤-x)=Pr(X≥x)=1-Pr(X≤x)=1-  (x)  Example 5.12: Find P(X≤-1.96) if X~N(0,1) 40

Pr(a≤X≤b)=Pr(X≤b)-Pr(X≤a)  Example 5.13: Find Pr(-1≤X≤1.5) if X~N(0,1)  Solution: Pr(-1≤X≤1.5) =Pr(X≤1.5)-Pr(X≤-1) =Pr(X≤1.5)-Pr(X≥1)= = (NOTE: The best way to work on such problems is to draw a graph!) 41

The (100  u)th percentile  The (100  u)th percentile of N(0,1) is denoted by z u such that, Pr(X< z u )=u, where X~N(0,1) 42

Example of finding percentiles  Example 5.18: Compute z 0.975,z 0.95,z 0.5 and z  (1) 1.96; (2) 1.645; (3) 0; (4)

Now: from N( , σ 2 ) to N(0,1)  Now we have become familiar with N(0,1), but we want to work on any general normal N( , σ 2 )  Example 5.20 (Hypertension): Suppose a mild hypertensive is defined as a person whose DBP is between 90 and 100 mm Hg inclusive, and the subjects are 35- to 40-year-old men whose blood pressure are normally distributed with mean 80 and variance 144. What is the probability that a randomly selected person from this population will be a mild hypertensive? This question can be stated more precisely: If X~N(80, 144), then what is Pr(90<X<100)? 44

How to standardize the normal distribution? 45

How to standardize the normal distribution? 46 Then Z has a standard normal distribution, Z ~ N(0, 1)

Standardization  IF X~ N( , σ 2 ) and Z=(X-µ)/ , then Z~N(0,1)  Then where the last two terms can be found from column A in normal table 47

Use standardization for many problems  Example 5.20 (Hypertension example continued): If X~N(80, 12^2), what is Pr(90<X<100)?  Solution: 48

Always draw a graph… 49

Next Lecture: Estimation I Point Estimate Interval Estimate