Presentation is loading. Please wait.

Presentation is loading. Please wait.

BASICS ON DISTRIBUTIONS

Similar presentations


Presentation on theme: "BASICS ON DISTRIBUTIONS"— Presentation transcript:

1 BASICS ON DISTRIBUTIONS
1

2 Histograms Consider an experiment in which different outcomes are possible (ex. Dice tossing) . The probability of all the outcomes can be represented in an histogram 2

3 Distributions Probabilities are described with distributions
Discrete variables: data are valued on a discrete set: xI={xi, i} Probability distribution: f(xi) = probability for the value xi Cumulative distribution: F(xi) = probability for a value ≤ xi Normalization: 𝑖∈𝐼 𝑓 𝑥 𝑖 =1 𝐹 𝑥 𝐻𝐼𝐺𝐻𝐸𝑆𝑇 =1 3

4 Examples Consider a coin with a tail probability equal to p
4 Consider a coin with a tail probability equal to p Flip a coin three times. Plot the probability distribution for the number of tails Flip a coin until the first tail appears. Plot the probability distribution for the number of flips 4

5 Empirical Distributions
Experimental data are represented with normalized histograms Discrete variables: data are valued on a discrete set: xI={xi, i} Probability distribution: f(xi) = N(x=xi)/Ntot Cumulative distribution: F(xi) = N(x≤xi)/Ntot Normalization: 𝑖∈𝐼 𝑓 𝑥 𝑖 =1 5

6 Characterization: Mode
The mode is the value that occurs the most frequently in a probability distribution. For empirical distributions, the mode is the value that occurs the most frequently in a data set. More than a mode can be present 6

7 Characterization: Median
a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values. 7

8 Characterization: Median
8

9 Averages (The Median) The median is the middle value of a set of data once the data has been ordered. Example 1. Robert hits 11 balls at Grimsby driving range. The recorded distances of his drives, measured in yards, are given below. Find the median distance for his drives. 85, 125, 130, 65, 100, 70, 75, 50, 140, 95, 70 50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 140 Ordered data Single middle value Median drive = 85 yards

10 Averages (The Median) 1 Sort the data 2 If the number of data is odd:
The median Algorithm. 1 Sort the data 2 If the number of data is odd: Take the middle value Else: Take the average between the two central values

11 Measuring the Spread with Median

12 Finding the median, quartiles and inter-quartile range.
Example 1: Find the median and quartiles for the data below. 12, 6, 4, 9, 8, 4, 9, 8, 5, 9, 8, 10 Order the data Lower Quartile = 5½ Q1 Median = 8 Q2 Upper Quartile = 9 Q3 4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12 Inter-Quartile Range = 9 - 5½ = 3½

13 Expected Value For a given function g(x) and a probability distribution f(x) the expected value of a discrete variable is computed as: 𝐸 𝑔(𝑥 = 𝑖∈𝐼 𝑔( 𝑥 𝑖 )𝑓 𝑥 𝑖

14 Properties of Expected Values
E[X+Y]=E[X]+E[Y] E[X+c]=E[X]+c E[aX+bY]=aE[X]+bE[Y] For example, if E[X]=5 and E[Y]=6, then E[X+5]=5+5=10 E[2X+5]=2*5+5=15 E[3X+2Y]=3*5+2*6=27 Prove 1), 2) and 3)

15 Properties of Expected Values and Variances
Var(X+Y)=Var(X)+Var(Y), if X and Y are independent. Var(aX)=(a^2)Var(X) Var(X+c)=Var(X) Var(aX+bY)=(a^2)Var(X)+(b^2)Var(Y), if X and Y are independent. Prove 4

16 Characterization: Mean
For empirical distributions 𝜇=𝐸 𝑥 = 𝑖∈𝐼 𝑥 𝑖 𝑓 𝑥 𝑖 𝜇=𝐸 𝑥 = 𝑖∈𝐼 𝑥 𝑖 𝑁 𝑥 𝑖 𝑁 𝑇𝑂𝑇 = 𝑗∈𝐷𝑎𝑡𝑎 𝑥 𝑗 𝑁 𝑇𝑂𝑇 16

17 Mean and Variance Variance
𝑉𝑎𝑟(𝑥 =𝐸 𝑥−𝜇 2 =𝐸 𝑥 2 −2𝑥𝜇+ 𝜇 2 = 𝐸 𝑥 2 −2𝜇⋅𝐸 𝑥 + 𝜇 2 =𝐸 𝑥 2 − 𝜇 2 17

18 Variance and Moments of a Random Variable
The 1th moment of a random variable X is E[X] Definition The k th moment of a random variable X is E[Xk ] The Variance of a random variable X is defined as The standard deviation of a random variable X is

19 Variance and Moments of a Random Variable
Definition The covariance of two random variable X and Y is Theorem For any two random variables X and Y.

20 Variance and Moments of a Random Variable
Proof:

21 Variance and Moments of a Random Variable
Theorem If X and Y are two independent random variables, then

22 Variance and Moments of a Random Variable
Proof:

23 Variance and Moments of a Random Variable
Corollary If X and Y are independent random variables, then and

24 Variance and Moments of a Random Variable
Definition Let X1, X2,…, Xn, be mutually independent random variables, then

25 Binomial distribution
Compute the probability of obtaining exactly 3 tails tossing 5 coins. These are all the possibilities TTTTT HTTTT THTTT TTHTT TTTHT TTTTH HHTTT HTHTT HTTHT HTTTH THHTT THTHT THTTH TTHHT TTHTH TTTHH TTHHH THTHH THHTH THHHT HTTHH HTHTH HTHHT HHTTH HHTHT HHHTT THHHH HTHHH HHTHH HHHTH HHHHT HHHHH f(5)=10/32 And what if the probabilities of head and tail are different? 25

26 Binomial distribution
How can we count without enumerating? Total number of configurations: 5 events, with two possibilities each  25=32 Number of configurations with two heads: 1) choose 3 of the 5 positions: 5 x (5-1) x (5-2) possibilities 2) the order of choice is irrelevant: there are 6 possible permutations of 3 objects (3!) 5 x 4 x 3/ 6 =5!/(3! X 2!) = 10 And what if the probabilities of head and tail are different? 26

27 Binomial distribution
Probability of having k positive events out of n independent experiments. The probability of a positive event is p 27

28 Binomial distribution (mean)
𝜇=𝐸 𝑘 = 𝑘∈𝑁 𝑘𝑓 𝑘 = 𝑘=0 𝑛 𝑘⋅ 𝑛 𝑘 𝑝 𝑘 1−𝑝 𝑛−𝑘 = 𝑘=1 𝑛 𝑘⋅ 𝑛! 𝑘!(𝑛−𝑘)! 𝑝 𝑘 1−𝑝 𝑛−𝑘 = 𝑘=1 𝑛 𝑛−1 !𝑛 𝑘−1)!(𝑛+1−𝑘−1)! 𝑝 𝑝 𝑘−1 1−𝑝 𝑛−𝑘 = 𝑛𝑝 𝑗=0 𝑛−1 𝑛−1 ! 𝑗)!(𝑛−1−𝑗)! 𝑝 𝑗 1−𝑝 𝑛−1−𝑗 =𝑛𝑝 Define j=k-1 28

29 Binomial distribution (variance)
𝑉𝑎𝑟(𝑘 =𝐸 𝑘−𝜇 2 =𝑛𝑝 1−𝑝 29

31 Maximum likelihood T* = argmax P(D|T,M) = T T* = argmax log(P(D|T,M))
D=data, M= model, T=model parameters 31

32 Evaluation of p Given n independent experiments, with k positive events, how to evaluate the best p? MAXIMUM LIKELIHOOD definition: Best p  p that maximises the total probability of the experiment results Example Which is the p that maximises the outcome of ? Which is the statistical error? 32

33 Evaluation of p 𝑓(𝑘|𝑛,𝑝 = 𝑛 𝑘 𝑝 𝑘 1−𝑝 𝑛−𝑘 𝑑𝑓 𝑑𝑝 | 𝑝= 𝑝 𝑀𝐿 =0 ⇒ 𝑘𝑝 𝑀𝐿 𝑘−1 1− 𝑝 𝑀𝐿 𝑛−𝑘 − 𝑝 𝑀𝐿 𝑘 𝑛−𝑘 1− 𝑝 𝑀𝐿 𝑛−𝑘−1 =0 𝑘 1− 𝑝 𝑀𝐿 − 𝑝 𝑀𝐿 𝑛−𝑘 =0 𝑝 𝑀𝐿 = 𝑘 𝑛 33

34 Poisson distribution the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate (λ) and independently of the time since the last event. (The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.) 34

35 Poisson distribution 𝑓(𝑘 = 𝜆 𝑘 𝑘! 𝑒 −𝜆 𝐸 𝑘 =𝜆 𝑉𝑎𝑟 𝑘 =𝜆 35

36 Continuous Distributions
Data are described with distributions Continuous variables: data are valued on a continuous interval: x  I   Cumulative probability: F(x) = N(x≤X)/Ntot Prob(A<x ≤B) = F(B)-F(A) = Probability density function (limit) Normalization: 𝐴 𝐵 𝑓(𝑥)𝑑𝑥 𝑓(𝑥 = 𝑑𝐹(𝑥 𝑑𝑥 𝐼 𝑓(𝑥)𝑑𝑥 =1 36

37 Mean and Variance Mean Variance 𝜇=𝐸 𝑥 = 𝑥∈𝐼 𝑥⋅𝑓(𝑥)𝑑𝑥
𝜇=𝐸 𝑥 = 𝑥∈𝐼 𝑥⋅𝑓(𝑥)𝑑𝑥 𝑉𝑎𝑟 𝑥 =𝐸 𝑥−𝜇 2 = 𝑥∈𝐼 𝑥−𝜇 2 ⋅𝑓(𝑥)𝑑𝑥 37

38 The Uniform Probability Distribution
Uniform Probability Density Function f (x) = 1/(b - a) for a < x < b = 0 elsewhere where a = smallest value the variable can assume b = largest value the variable can assume The probability of the continuous random variable assuming a specific value is 0. P(x=x1) = 0

39 Example: Buffet Probability Density Function
Customers are charged for the amount of salad they take. Sampling suggests that the amount of salad taken is uniformly distributed between 5 ounces and 15 ounces. Probability Density Function f (x ) = 1/10 for 5 < x < 15 = 0 elsewhere where x = salad plate filling weight

40 Example: Buffet What is the probability that a customer will take
between 12 and 15 ounces of salad? F (x ) x 5 10 15 12 1/10 Salad Weight (oz.) P(12 < x < 15) = 1/10(3) = .3

41 The Uniform Probability Distribution
f (x ) P(8<x < 12) = ? 1/10 5 8 12 x 15 P(8<x < 12) = (1/10)(12-8) = .4

42 The Uniform Probability Distribution
f (x ) P(0<x < 12) = ? 1/10 5 12 x 15 P(0<x < 12) = P(5<x < 12)= = (1/10)(12-5) = .7

43 The Uniform Probability Distribution
Uniform Probability Density Function f (x) = 1/(b - a) for a < x < b = 0 elsewhere Expected Value of x E(x) = (a + b)/2 Variance of x Var(x) = (b - a)2/12 where a = smallest value the variable can assume b = largest value the variable can assume Prove it!!

44 Normal distribution The normal distribution is considered the most  “basic” continuous probability distribution. Specifically, by the central limit theorem, under certain conditions the sum of a number of random variables with finite means and variances approaches a normal distribution as the number of variables increases. For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural sciences, and social sciences as a simple model for complex phenomena. For example, the observational error in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertainty is computed using this assumption. 44

45 Normal distribution μ : mean σ2 : variance
𝑓(𝑥 =𝛮 𝑥|𝜇, 𝜎 2 = 𝜋𝜎 ⋅ 𝑒 − 𝑥−𝜇 𝜎 2 μ : mean σ2 : variance 45

46 Normal distribution:continuous
𝑓(𝑥 =𝛮 𝑥|𝜇, 𝜎 2 = 𝜋𝜎 ⋅ 𝑒 − 𝑥−𝜇 𝜎 2 μ : mean σ2 : variance 𝐸 𝑥 = 𝑥𝛮 𝑥|𝜇, 𝜎 2 𝑑𝑥=𝜇 𝐸 𝑥 2 = 𝑥 2 𝛮 𝑥|𝜇, 𝜎 2 𝑑𝑥= 𝜇 2 + 𝜎 2 46

47 Gaussian distribution in a D-dimensional space
47 where μ is a D-valued vector (means) and ∑ is a DxD symmetric matrix (covariance matrix), with determinant |∑| 𝛮(𝑥|𝜇,𝛴 = 1 2𝜋 𝐷 2 𝛴 exp − 𝑥−𝜇 𝑇 𝛴 −1 𝑥−𝜇 47

48 Estimating Mean and Variance from sampled data
Consider a set of n independent and identically distributed data, following a normal distribution with mean μ and variance σ2 x1, x2 .., xn = X How can we estimate mean and variance? Example: X=0.2;0.18;0.24;0.14;0.05;0.26;0.15;0.29,0.49 48

49 Estimating mean and Variance from sampled data
MAXIMUM LIKELIHOOD: Estimating the parameters as to maximize the probability for the sampled values 𝜇 𝑀𝐿 , 𝜎 𝑀𝐿 2 = argmax 𝜇, 𝜎 2 𝑓( 𝑥 1 , 𝑥 2 .. 𝑥 𝑛 |𝜇, 𝜎 2 49

50 ML Mean and Variance The probability for the sampled data is
It has to be maximised over the variables μ and σ2 𝑓 𝑋|𝜇, 𝜎 2 = 𝑖=1 𝑛 𝜋𝜎 ⋅ 𝑒 − 𝑥 𝑛 −𝜇 𝜎 2 50

51 ML Mean and Variance Since logarithm is a monotonic function
arg max 𝜇,𝜎 𝑓 𝑋|𝜇, 𝜎 2 =arg max 𝜇,𝜎 ln𝑓 𝑋|𝜇, 𝜎 2 = arg max 𝜇,𝜎 − 𝑛 2 ln 2𝜋 −− 𝑛 2 ln 𝜎 2 − 1 2 𝜎 2 𝑖=1 𝑛 𝑥 𝑖 −𝜇 2 51

52 ML Mean and Variance ∂ ∂𝜇 − 𝑛 2 ln 𝜎 2 − 1 2 𝜎 2 𝑖=1 𝑛 𝑥 𝑖 −𝜇 | 𝜇 𝑀𝐿 , 𝜎 2 𝑀𝐿 =0 ∂ ∂ 𝜎 2 − 𝑛 2 ln 𝜎 2 − 1 2 𝜎 2 𝑖=1 𝑛 𝑥 𝑖 −𝜇 𝜇 𝑀𝐿 , 𝜎 2 𝑀𝐿 =0 𝑖=1 𝑛 𝑥 𝑛 − 𝜇 𝑀𝐿 =0 − 𝑛 2 𝜎 2 𝑀𝐿 𝜎 2 𝑀𝐿 𝑖=1 𝑛 𝑥 𝑖 − 𝜇 𝑀𝐿 2 =0 52

53 ML Mean and Variance Are these the expressions that you usually apply?
𝜇 𝑀𝐿 = 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 𝜎 2 𝑀𝐿 = 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 − 𝜇 𝑀𝐿 2 Are these the expressions that you usually apply? 53

54 The μML is an unbiased estimation
Consider a set of data generated starting from a normal distribution N(μ, σ2 )the estimation 𝜇 𝑀𝐿 = 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 Is an unbiased estimate of μ since 𝐸 𝜇 𝑀𝐿 = 𝑛 𝑖=1 𝑛 𝑥 𝑖 ⋅ 𝜋𝜎 2 𝑛 2 ⋅ 𝑒 − 𝑥 1 −𝜇 𝜎 2 ⋅ 𝑒 − 𝑥 2 −𝜇 𝜎 ⋅ 𝑒 − 𝑥 𝑛 −𝜇 𝜎 𝑑𝑥 1 𝑑𝑥 2 𝑑𝑥 𝑛 = 1 𝑛 𝑖=1 𝑛 𝐸 𝑥 𝑖 =𝜇 54

55 The σML2 is a biased estimation
𝜎 𝑀𝐿 2 = 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 −𝑀 2 Is a biased estimate of σ2 since 𝐸 𝜎 𝑀𝐿 2 =𝐸 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 − 1 𝑛 𝑗=1 𝑛 𝑥 𝑗 = 𝐸 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 2 − 2 𝑛 2 𝑖=1 𝑛 𝑗=1 𝑛 𝑥 𝑖 𝑥 𝑗 𝑛 2 𝑗=1 𝑛 𝑘=1 𝑛 𝑥 𝑗 𝑥 𝑘 = 𝐸 𝑥 2 − 1 𝑛 2 𝐸 𝑖=1 𝑛 𝑗=1 𝑛 𝑥 𝑖 𝑥 𝑗 =𝐸 𝑥 2 − 1 𝑛 2 𝑖=1 𝑛 𝑗=1 𝑛 𝜇 2 + 𝛿 𝑖𝑗 𝜎 2 = 𝜇 2 + 𝜎 2 − 𝜇 2 − 1 𝑛 𝜎 2 = 𝑛−1 𝑛 𝜎 2 ≠ 𝜎 2 55

56 σML2 is underestimating σ2
Green: “Real” distribution Red: Sampled distributions σML2 is evaluated with respect to the sample mean 56

57 Unbiased estimation of μ and σ2
𝑀= 𝜇 𝑀𝐿 = 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 𝑆 2 = 𝑛 𝑛−1 𝜎 2 𝑀𝐿 = 1 𝑛−1 𝑖=1 𝑛 𝑥 𝑖 −𝑀 2 57

58 Distribution of the sample mean
Consider different sets of sampled data Xk (e.g. different levels of expression of genes in n different populations of individuals) A sample mean for each set can be defined How are they distributed? 𝑀 𝑘 = 1 𝑛 𝑘 𝑖=1 𝑛 𝑘 𝑥 𝑖 𝑘 58

59 Distribution of the sample means
M is a linear combination of independent identically distributed normal variables: it is normally distributed 𝑀= 1 𝑛 𝑖=1 𝑛 𝑥 𝑖 59

60 Distribution of the sample means
𝐸 𝑀 =𝜇 var 𝑀 =𝐸 𝑀−𝜇 2 =𝐸 𝑀 2 + 𝜇 2 −2𝜇𝑀 = 𝐸 𝑀 2 + 𝜇 2 −2 𝜇 2 =𝐸 𝑀 2 − 𝜇 2 𝐸 𝑀 2 =𝐸 1 𝑛 2 𝑖=1 𝑛 𝑗=1 𝑛 𝑥 𝑖 𝑥 𝑗 = 1 𝑛 2 𝑖=1 𝑛 𝑗=1 𝑛 𝜇 2 + 𝛿 𝑖𝑗 𝜎 2 = 𝜇 𝑛 𝜎 2 var 𝑀 = 1 𝑛 𝜎 2 60

61 Distribution of the sample means
61

62 Distribution of the sample deviations
If Xi are n independent, normally distributed random variables with mean μ and variance σ2, then the random variable is distributed according to the chi-square distribution with n-1 degrees of freedom. This is usually written as: The chi-square distribution has one parameter: (n-1) that is a positive integer that specifies the number of degrees of freedom (i.e. the number of independent Xi -μ ) 𝑄= 𝑖=1 𝑁 𝑋 𝑖 −𝜇 2 𝑄≈ 𝜎 2 𝜒 𝑛−1 2 62

63 Chi-square distribution
Mean = k Variance = 2k 63

64 Distribution of the sample deviations
So, being the unbiased sample variance is distributed according to the chi-square distribution with k degrees of freedom. It follows that 𝑆 2 = 1 𝑛−1 𝑖=1 𝑛 𝑥 𝑖 −𝑀 2 𝑆 2 ≈ 𝜎 2 𝑛−1 𝜒 𝑛−1 2 𝐸 𝑆 2 ≈ 𝜎 2 𝑛−1 𝐸 𝜒 𝑛−1 2 = 𝜎 2 𝑛−1 𝑛−1 = 𝜎 2 Unbiased 64

65 Distribution of the sample deviations: Proof for 1 degree of freedom
Let random variable Y be defined as Y = X 2 where X has normal distribution with mean 0 and variance 1 We can compute the cumulative function 65

66 Distribution of the sample deviations: Proof for 1 degree of freedom
y>0 66


Download ppt "BASICS ON DISTRIBUTIONS"

Similar presentations


Ads by Google