Maximum likelihood estimation Michail Tsagris & Ioannis Tsamardinos
Histograms revisited Looks symmetric and unimodal.
Normal distribution 𝑓 𝑥 = 1 2𝜋 𝜎 2 𝑒 − 𝑥−𝜇 2 2𝜎 2
MLE of normal distribution Suppose we have collected some data, RNA expression measurements for example. We do a histogram and we see a nice bell shaped distribution. How do we find the parameters μ and 𝜎 2 ; Given, what we see, we will calculate the most probable values assuming that the sample comes from a normally distributed population. We need to estimate the values of these parameters which are most likely to have produced those data. The values which maximise the likelihoods of observing such data.
MLE of normal distribution So we will calculate the values that maximise the likelihood of having observed these data. Denote the n values we have observed by 𝑥 1 , 𝑥 2 , …, 𝑥 𝑛 . Each of these 𝑥 𝑖𝑠 come from a normal population with some mean μ and some variance σ2. The two values are the same and in addition, the values are independent from each other (independent and identically distributed, iid).
MLE of normal distribution So, each 𝑥 𝑖 can be plugged into the formula (probability density function) of the normal distribution: f (𝑥 𝑖 )= 1 2𝜋 𝜎 2 𝑒 − 𝑥 𝑖 −𝜇 2 2𝜎 2 . Let us take the product of all the 𝑓 𝑖𝑠 : 𝐿(𝜇, 𝜎 2 )= 𝑖=1 𝑛 1 2𝜋 𝜎 2 𝑒 − 𝑥 𝑖 −𝜇 2 2𝜎 2 . The goal is to maximise the above expression. Better to maximise its logarithm. The two points (𝜇, 𝜎 2 ) which maximise the L will maximise the log(L) as well.
MLE of normal distribution ℓ(𝜇, 𝜎 2 )=− 𝑛 2 log 2𝜋 𝜎 2 − 𝑖=1 𝑛 𝑥 𝑖 −𝜇 2 2 𝜎 2 𝜕ℓ 𝜕𝜇 = 𝑖=1 𝑛 𝑥 𝑖 −𝜇 2𝜎 2 =0 ⇒ 𝜇 = 𝑖=1 𝑛 𝑥 𝑖 𝑛 = 𝑥 𝜕ℓ 𝜕 𝜎 2 =− 𝑛 2 𝜎 2 + 𝑖=1 𝑛 𝑥 𝑖 −𝜇 2 2 𝜎 4 =0⇒ 𝜎 2 = 𝑖=1 𝑛 𝑥 𝑖 −𝜇 2 𝑛 .
MLE of mean, median and proportion So, the MLE of the mean is simply the sample mean of the data. The sample median serves as the MLE of the median. What about proportions? Suppose that a dose of 1μL kills 5 of the 30 mice in a wet lab experiment. The estimated rate of killing of the dose of this specific drug is 5/30 = 0.167 or 16.67%.
MLE of mean and median The sample mean can also be seen as the quantity θ that minimises the sum of squares of differences 𝑖=1 𝑛 𝑥 𝑖 −𝜃 2 . The sample median on the other hand is the quantity θ that minimises the sum of absolute differences 𝑖=1 𝑛 𝑥 𝑖 −𝜃 .
Confidence intervals Suppose one calculates the proportion of mice killed at a given drug dose. Is it enough to present just a number? What else can he say? The uncertainty must be quantified via a range of most likely values, an interval of most likely values, an interval of high confidence. The most common sentence is 95% confidence interval. The 5% is the standard, the most popular, but not the unique number.
Confidence intervals We performed a study, found some summary statistics and constructed a 95% confidence interval for the mean or the proportion. This means that with 95% probability the true means lies within the computed range.
Confidence intervals We performed a study, found some summary statistics and constructed a 95% confidence interval for the mean or the proportion. This means that with 95% probability the true means lies within the computed range. Wrong!!!
Confidence intervals We performed a study, found some summary statistics and constructed a 95% confidence interval for the mean or the proportion. But we only did the analysis once. In other words, our confidence interval has a 95% probability (we expect it is one of the 95% intervals) to have included the true mean or proportion or any parameter in general.
Confidence intervals We performed a study, found some summary statistics and constructed a 95% confidence interval for the mean or the proportion. If we were to repeat this analysis many times, we would expect that 95% of the times we would have included the true mean. Correct!!!
Confidence intervals
Confidence intervals
Confidence intervals
(Student’s) t distribution (William Gosset) We have spoken of the normal distribution. Let us now see the t distribution. 𝑓 𝑥;𝜇, 𝜎 2 = 1 2𝜋 𝜎 2 𝑒 − 𝑥−𝜇 2 2𝜎 2 Normal density 𝑓 𝑥;𝑣 = Γ( 𝑣+1 2 ) 𝑣𝜋 Γ( 𝑣 2 ) 1+ 𝑥 2 𝑣 − 𝑣+1 2 t density
(Student’s) t distribution (William Gosset) As v (degrees of freedom) increases the distribution approaches the normal distribution. lim 𝑣→∞ 𝑡 𝑣 −>𝑁(0, 1)
Population and sample revisit Greek letters indicate population parameters English letters correspond to sample estimates. Mean, variance and standard deviation example. 𝜇, 𝜎 2 , 𝜎. 𝑥, 𝑠 2 , 𝑠.
(1-α)% confidence interval for the mean Suppose we have estimated the average glucose concentration to be 𝑥 =86 mg/dL with a variance equal to 𝑠 2 =25 and we want to construct a 95% confidence interval for the true concentration. Our sample consists of 𝑛=31 people. 95% is called confidence level
(1-α)% confidence interval for the mean Suppose we have estimated the average glucose concentration to be 𝑥 =86 mg/dL with a variance equal to 𝑠 2 =25 and we want to construct a 95% confidence interval for the true concentration. Our sample consists of 𝑛=31 people. 𝑥 − 𝑡 1− 𝑎 2 , 𝑛−1 𝑠 𝑛 , 𝑥 + 𝑡 1− 𝑎 2 , 𝑛−1 𝑠 𝑛 . We have everything but the term 𝑡 1− 𝑎 2 , 𝑣−1 .
t distribution tables
(1-α)% confidence interval for the mean 𝑡 1− 𝑎 2 , 𝑣−1 = 𝑡 1− 0.05 2 , 35−1 = 𝑡 0.975, 34 =2.042 86 −2.042 25 31 ,86 2.042 25 31 = 84.166, 87.834 .
(1-α)% confidence interval for the proportion In sample of 132 women who smoke, 58 of them were found to have increased chances of getting breast cancer. 𝑝 = 58 132 =0.4394 or 43.94% The relevant 95% confidence interval is given by 𝑝 − Ζ 1− 𝑎 2 𝑝 (1− 𝑝 ) 𝑛 , 𝑝 + Ζ 1− 𝑎 2 𝑝 (1− 𝑝 ) 𝑛 . What is Ζ 1− 𝑎 2 ?
(1-α)% confidence interval for the proportion Ζ 1− 𝑎 2 = Ζ 1− 0.05 2 = Ζ 0.975 =1.96. 0.4394 −1.96 0.4394 1−0.4394 132 , 0.4394 +1.96 0.4394(1−0.4394) 132 0.4394 −1.96 0.4394 1−0.4394 132 , 0.4394 +1.96 0.4394(1−0.4394) 132 0.3547, 0.5241 or 35.47%, 52.41% .