DATA ANALYSIS Module Code: CA660 Lecture Block 3.

Slides:



Advertisements
Similar presentations
DATA ANALYSIS Module Code: CA660 Lecture Block 2.
Advertisements

MOMENT GENERATING FUNCTION AND STATISTICAL DISTRIBUTIONS
Random Variables ECE460 Spring, 2012.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Chapter 5 Discrete Random Variables and Probability Distributions
Discrete Probability Distributions
Review of Basic Probability and Statistics
Chapter 4 Discrete Random Variables and Probability Distributions
Chapter 1 Probability Theory (i) : One Random Variable
DATA ANALYSIS Module Code: CA660 Lecture Block 2.
Module Code: CA660 Lecture Block 3
Probability Densities
Review.
Chapter 6 Continuous Random Variables and Probability Distributions
Probability Distributions
1 Review of Probability Theory [Source: Stanford University]
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
Discrete Probability Distributions
Some standard univariate probability distributions
Continuous Random Variables and Probability Distributions
Chapter 4: Joint and Conditional Distributions
Chapter 5 Continuous Random Variables and Probability Distributions
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Class notes for ISE 201 San Jose State University
Some standard univariate probability distributions
Discrete Random Variables and Probability Distributions
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Lecture 28 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics.
Discrete Distributions
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Chapter 5 Discrete Random Variables and Probability Distributions ©
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
MOMENT GENERATING FUNCTION AND STATISTICAL DISTRIBUTIONS
Chapter 5 Statistical Models in Simulation
Tch-prob1 Chap 3. Random Variables The outcome of a random experiment need not be a number. However, we are usually interested in some measurement or numeric.
Winter 2006EE384x1 Review of Probability Theory Review Session 1 EE384X.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Moment Generating Functions
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review II Instructor: Anirban Mahanti Office: ICT 745
Continuous Distributions The Uniform distribution from a to b.
Biostatistics Class 3 Discrete Probability Distributions 2/8/2000.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
1 3. Random Variables Let ( , F, P) be a probability model for an experiment, and X a function that maps every to a unique point the set of real numbers.
1 3. Random Variables Let ( , F, P) be a probability model for an experiment, and X a function that maps every to a unique point the set of real numbers.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Chapter 4-5 DeGroot & Schervish. Conditional Expectation/Mean Let X and Y be random variables such that the mean of Y exists and is finite. The conditional.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Random Variables Example:
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
Review of Probability Concepts Prepared by Vera Tabakova, East Carolina University.
Continuous Random Variables and Probability Distributions
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Random Variables By: 1.
Random variables (r.v.) Random variable
Chapter 5 Statistical Models in Simulation
Probability Theory and Specific Distributions (Moore Ch5 and Guan Ch6)
3. Random Variables Let (, F, P) be a probability model for an experiment, and X a function that maps every to a unique point.
Discrete Random Variables and Probability Distributions
Geometric Poisson Negative Binomial Gamma
Presentation transcript:

DATA ANALYSIS Module Code: CA660 Lecture Block 3

2 MEASURING PROBABILITIES – RANDOM VARIABLES & DISTRIBUTIONS (Primer) If a statistical experiment only gives rise to real numbers, the outcome of the experiment is called a random variable. If a random variable X takes values X 1, X 2, …, X n with probabilities p 1, p 2, …, p n then the expected or average value of X is defined E[X] = p j X j and its variance is VAR[X] = E[X 2 ] - E[X] 2 = p j X j 2 - E[X] 2

3 Random Variable PROPERTIES Sums and Differences of Random Variables Define the covariance of two random variables to be COVAR [ X, Y] = E [(X - E[X]) (Y - E[Y]) ] = E[X Y] - E[X] E[Y] If X and Y are independent, COVAR [X, Y] = 0. LemmasE[ X  Y] = E[X]  E[Y] VAR [ X  Y] = VAR [X] + VAR [Y]  2COVAR [X, Y] and E[ k. X] = k.E[X], VAR[ k. X] = k 2.E[X] for a constant k.

4 Example: R.V. characteristic properties B =1 2 3 Totals R = Totals E[B] = {1(19)+2(23)+3(20) / 62 = 2.02 E[B 2 ] = {1 2 (19)+2 2 (23)+3 2 (20) / 62 = 4.69 VAR[B] = ? E[R] = {1(27)+2(16)+3(19)} / 62 = 1.87 E[R 2 ] = {1 2 (27)+2 2 (16)+3 2 (19)} / 62 = 4.23 VAR[R] = ?

5 Example Contd. E[B+R] = { 2(8)+3(10)+4(9)+3(5)+4(7)+ 5(4)+4(6)+5(6)+6(7)} / 62 = 3.89 E[(B + R) 2 ] = {2 2 (8)+3 2 (10)+4 2 (9)+3 2 (5)+4 2 (7)+ 5 2 (4)+4 2 (6)+5 2 (6)+6 2 (7)} / 62 = VAR[(B+R)] = ? * E[B R] = {1(8)+2(10)+3(9)+2(5)+4(7)+6(4) +3(6)+6(6)+9(7)}/ 62 = 3.77 COVAR (B, R) = ? Alternative calculation to * VAR[B] + VAR[R] + 2 COVAR[ B, R] Comment?

6 DISTRIBUTIONS - e.g. MENDEL’s PEAS

7 P.D.F./C.D.F. If X is a R.V. with a finite countable set of possible outcomes, {x 1, x 2,…..}, then the discrete probability distribution of X and D.F. or C.D.F. While, similarly, for X a R.V. taking any value along an interval of the real number line So if first derivative exists, then is the continuous pdf, with

8 EXPECTATION/VARIANCE Clearly, and

9 Moments and M.G.F’s For a R.V. X, and any non-negative integer k, kth moment about the origin is defined as expected value of X k Central Moments (about Mean): 1 st = 0 i.e. E{X}= , second = variance, Var{X} To obtain Moments, use Moment Generating Function If X has a p.d.f. f(x), mgf is the expected value of e tX For a continuous variable, then For a discrete variable, then Generally: r th moment of the R.V. is r th derivative evaluated at t = 0

10 PROPERTIES - Expectation/Variance etc. Prob. Distributions (p.d.f.s) As for R.V.’s generally. For X a discrete R.V. with p.d.f. p{X}, then for any real-valued function g e.g. Applies for more than 2 R.V.s also Variance - again has similar properties to previously: e.g.

11 MENDEL’s Example Let X record the no. of dominant A alleles in a randomly chosen genotype, then X= a R.V. with sample space S = {0,1,2} Outcomes in S correspond to events Note: Further, any function of X is also a R.V. Where Z is a variable for seed character phenotype

12 Example contd. So that, for Mendel’s data, And with And Note: Z = ‘dummy’ or indicator. Could have chosen e.g. Q as a function of X s.t. Q = 0 round, (X >0), Q = 1 wrinkled, (X=0). Then probabilities for Q opposite to those for Z with and

13 JOINT/MARGINAL DISTRIBUTIONS Joint cumulative distribution of X and Y, marginal cumulative for X, without regard to Y and joint distribution (p.d.f.) of X and Y then, respectively where similarly for continuous case e.g. (2) becomes

14 Example: Backcross 2 locus model (AaBb  aabb) Observed and Expected frequencies Genotypic S.R 1:1 ; Expected S.R. crosses 1:1:1:1 Cross Genotype Pooled Frequency AaBb 310(300) 36(30) 360(300) 74(60) 780(690) Aabb 287(300) 23(30) 230(300) 50(60) 590(690) aaBb 288(300) 23(30) 230(300) 44(60) 585(690) aabb 315(300) 38(30) 380(300) 72(60) 805(690) Marginal A Aa 597(600) 59(60) 590(600) 124(120) 1370(1380) aa 603(600) 61(60) 610(600) 116(120) 1390(1380) Marginal B Bb 598(600) 59(60) 590(600) 118(120) 1365(1380) bb 602(600) 61(60) 610(600) 122(120) 1395(1380) Sum

15 CONDITIONAL DISTRIBUTIONS Conditional distribution of X, given that Y=y where for X and Y independent and Example: Mendel’s expt. Probability that a round seed (Z=1) is a homozygote AA i.e. (X=2) AND - i.e. joint or intersection as above i.e. JOINT

16 Standard Statistical Distributions Importance Modelling practical applications Mathematical properties are known Described by few parameters, which have natural interpretations. Bernoulli Distribution. This is used to model a trial/expt. which gives rise to two outcomes: success/ failure: male/ female, 0 / 1..… Let p be the probability that the outcome is one and q = 1 - p that the outcome is zero. E[X] = p (1) + (1 - p) (0) = p VAR[X] = p (1) 2 + (1 - p) (0) 2 - E[X] 2 = p (1 - p). 01p Prob p p

17 Standard distributions - Binomial Binomial Distribution. Suppose that we are interested in the number of successes X in n independent repetitions of a Bernoulli trial, where the probability of success in an individual trial is p. Then Prob{X = k} = n C k p k (1-p) n - k, (k = 0, 1, …, n) E[X] = n p VAR[X] = n p (1 - p) (n=4, p=0.2) Prob 1 4 np This is the appropriate distribution to model e.g. Number of recombinant gametes produced by a heterozygous parent for a 2-locus model. Extension for  3 loci is multinomial

18 Standard distributions - Poisson Poisson Distribution. The Poisson distribution arises as a limiting case of the binomial distribution, where n , p  in such a way that np  Constant) P{X = k} = exp ( -    … ). E [X] = VAR [X] = Poisson is used to model No.of occurrences of a certain phenomenon in a fixed period of time or space, e.g. O particles emitted by radioactive source in fixed direction for interval  T O people arriving in a queue in a fixed interval of time O genomic mapping functions, e.g. cross over as a random event X 5 1

19 Other Standard examples: e.g. Hypergeometric, Exponential…. Consider a population of M items, of which W are deemed to be successes. Let X be the number of successes that occur in a sample of size n, drawn without replacement from the finite population Prob { X = k} = W C k M-W C n-k / M C n ( k = 0, 1, 2, … ) Then E [X] = n W / M VAR [X] = n W (M - W) (M - n) / { M 2 (M - 1)} Exponential : special case of the Gamma distribution with n = 1 used e.g. to model inter-arrival time of customers or time to arrival of first customer in a simple queue, e.g. fragment lengths in genome mapping etc. The p.d.f. is f (x)= exp ( - x ),x  0  0 = 0otherwise

20 Standard p.d.f.’s - Gaussian/ Normal A random variable X has a normal distribution with mean  and standard deviation  if it has density with and Arises naturally as the limiting distribution of the average of a set of independent, identically distributed random variables with finite variances. Plays a central role in sampling theory and is a good approximation to a large class of empirical distributions. Default assumption  in many empirical studies is that each observation is approx. ~ N(  2 ) Statistical tables of the Normal distribution are of great importance in analysing practical data sets. X is said to be a Standardised Normal variable if  = 0 and  = 1.

21 Standard p.d.f.’s : Student’s t-distribution A random variable X has a t -distribution with ‘n’ d.o.f. ( t n ) if it has density = 0 otherwise. Symmetrical about origin, with E[X] = 0 & V[X] = n / (n -2). For small n, the t n distribution is very flat. For n  25, the t n distribution  standard normal curve. Suppose Z a standard Normal variable, W has a  n 2 distribution and Z and W independent then r.v. form If x 1, x 2, …,x n is a random sample from N(   , and, if define then

22 Chi-Square Distribution A r.v. X has a Chi-square distribution with n degrees of freedom; (n a positive integer) if it is a Gamma distribution with = 1, so its p.d.f. is E[X] =n ; Var [X] =2n Two important applications: - If X 1, X 2, …, X n a sequence of independently distributed Standardised Normal Random Variables, then the sum of squares X X … + X n 2 has a  2 distribution (n degrees of freedom). - If x 1, x 2, …, x n is a random sample from N(  2 ), then and and s 2 has  2 distribution, n - 1 d.o.f., with r.v.’s and s 2 independent. X  2 ν (x) Prob

23 F-Distribution A r.v. X has an F distribution with m and n d.o.f. if it has a density function = ratio of gamma functions for x>0 and = 0 otherwise. For X andY independent r.v.’s, X ~  m 2 and Y~  n 2 then One consequence: if x 1, x 2, …, x m ( m  is a random sample from N(  1,  1 2 ), and y 1, y 2, …, y n ( n  a random sample from N(  2,  2 2 ), then