DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistical Estimation and Sampling Distributions
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
DATA ANALYSIS Module Code: CA660 Lecture Block 5.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Factor Analysis Purpose of Factor Analysis
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Maximum likelihood (ML)
Maximum likelihood (ML) and likelihood ratio (LR) test
Parametric Inference.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
DATA ANALYSIS Module Code: CA660 Lecture Block 5.
DATA ANALYSIS Module Code: CA660 Lecture Block 7.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Visual Recognition Tutorial
Linear and generalised linear models
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
Linear and generalised linear models
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
Maximum likelihood (ML)
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Maximum Likelihood See Davison Ch. 4 for background and a more thorough discussion. Sometimes.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Model Inference and Averaging
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chapter 7 Point Estimation
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
12. Principles of Parameter Estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation

2 MAXIMUM LIKELIHOOD ESTIMATION Recall general points: Estimation, definition of Likelihood function for a vector of parameters  and set of values x. Find most likely value of  = maximise the Likelihood fn. Also defined Log-likelihood (Support fn. S(  ) ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by Why MLE? (Need to know underlying distribution). Properties: Consistency; sufficiency; asymptotic efficiency (linked to variance); unique maximum; invariance and, hence most convenient parameterisation; usually MVUE; amenable to conventional optimisation methods.

3 VARIANCE, BIAS & CONFIDENCE Variance of an Estimator - usual form or for k independent estimates For a large sample, variance of MLE can be approximated by can also estimate empirically, using re-sampling* techniques. Variance of a linear function (of several estimates) – (common need in genomics analysis, e.g. heritability), in risk analysis Recall Bias of the Estimator then the Mean Square Error is defined to be: expands to so we have the basis for C.I. and tests of hypothesis.

4 COMMONLY-USED METHODS of obtaining MLE Analytical - solving or when simple solutions exist Grid search or likelihood profile approach Newton-Raphson iteration methods EM (expectation and maximisation) algorithm N.B. Log.-likelihood, because max. same  value as Likelihood Easier to compute Close relationship between statistical properties of MLE and Log-likelihood

5 MLE Methods in outline Analytical : - recall Binomial example earlier Example : For Normal, MLE’s of mean and variance, (taking derivatives w.r.t mean and variance separately), and equivalent to sample mean and actual variance (i.e. /N), - unbiased if mean known, biased if not. Invariance : One-to-one relationships preserved Used: when MLE has a simple solution

6 MLE Methods in outline contd. Grid Search – Computational Plot likelihood or log-likelihood vs parameter. Various features Relative Likelihood =Likelihood/Max. Likelihood (ML set =1). Peak of R.L. can be visually identified /sought algorithmically. e.g. Plot likelihood and parameter space range - gives 2 peaks, symmetrical around (  likelihood profile for e.g. well-known mixed linkage analysis problem. Or for similar example of populations following known proportion splits). If now constrain MLE solution unique e.g. = R.F. between genes (possible mixed linkage phase).

7 MLE Methods in outline contd. Graphic/numerical Implementation - initial estimate of . Direction of search determined by evaluating likelihood to both sides of . Search takes direction giving increase, because looking for max. Initial search increments large, e.g. 0.1, then when likelihood change starts to decrease or become negative, stop and refine increment. Issues: Multiple peaks – can miss global maximum, computationally intensive ; see e.g. Multiple Parameters - grid search. Interpretation of Likelihood profiles can be difficult, e.g. likelihood-estimation-in-sasiml/ likelihood-estimation-in-sasiml/

8 Example in outline Data e.g used to show a linkage relationship (non-independence) between e.g. marker and a given disease gene, or (e.g. between sex and purchase) of computer games. Escapes = individuals who are susceptible, but show no disease phenotype under experimental conditions: (express interest but no purchase record). So define as proportion of escapes and R.F. respectively. is penetrance for disease trait or of purchasing, i.e. P{ that individual with susceptible genotype has disease phenotype}. P{individual of given sex and interested who actually buys} Purpose of expt.-typically to estimate R.F. between marker and gene or proportion of a sex that purchases Use: Support function = Log-Likelihood. Often quite complex, e.g. for above example, might have

9 Example contd. Setting 1st derivatives (Scores) w.r.t and w.r.t. Expected value of Score (w.r.t.  is zero, (see analogies in classical sampling/hypothesis testing). Similarly for . Here, however, No simple analytical solution, so can not solve directly for either. Using grid search, likelihood reaches maximum at e.g. In general, this type of experiment tests H 0 : Independence between the factors (marker and gene), (sex and purchase) and H 0 : no escapes Uses Likelihood Ratio Test statistics. (M.L.E.  2 equivalent)

10 MLE Methods in outline contd. Newton-Raphson Iteration Have Score (  ) = 0 from previously. N-R consists of replacing Score by linear terms of its Taylor expansion, so if  ´´ a solution,  ´=1st guess Repeat with  ´´ replacing  ´ Each iteration - fits a parabola to Likelihood Fn. Problems - Multiple peaks, zero Information, extreme estimates Multiple parameters – need matrix notation, where S matrix e.g. has elements = derivatives of S( ,  ) w.r.t.  and  respectively. Similarly, Information matrix has terms of form  Estimates are L.F. 2 nd 1st  Variance of Log-L i.e.S(  )

11 MLE Methods in outline contd. Expectation-Maximisation Algorithm - Iterative. Incomplete data (Much genomic, financial and other data fit this situation e.g. linkage analysis with marker genotypes of F2 progeny. Usually 9 categories observed for 2- locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage. Some hidden, but if linkage parameter known, expected frequencies can be predicted and the complete data restored using expectation). Steps: (1) Expectation estimates statistics of complete data, given observed incomplete data. -(2) Maximisation uses estimated complete data to give MLE. Iterate till converges (no further change)

12 E-M contd. Implementation Initial guess,  ´, chosen (e.g. =0.25 say = R.F.). Taking this as “true”, complete data is estimated, by distributional statements e.g. P(individual is recombinant, given observed genotype) for R.F. estimation. MLE estimate  ´´ computed. This, for R.F.  sum of recombinants/N. Thus MLE, for f i observed count, Convergence  ´´ =  ´ or

13 LIKELIHOOD : C.I. and H.T. Likelihood Ratio Tests – c.f. with  2. Principal Advantage of G is Power, as unknown parameters involved in hypothesis test. Have : Likelihood of  taking a value  A which maximises it, i.e. its MLE and likelihood  under H 0 :  N, (e.g.  N = 0.5) Form of L.R. Test Statistic or, conventionally - choose; easier to interpret. Distribution of G ~ approx.  2 (d.o.f. = difference in dimension of parameter spaces for L(  A ), L(  N ) ) Goodness of Fit : notation as for  2, G ~  2 n-1 : Independence: notation again as for  2

14 Likelihood C. I.’s – graphical method Example: Consider the following Likelihood function  is the unknown parameter ; a, b observed counts For 4 data sets observed, A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100) Likelihood estimates can be plotted vs possible parameter values, with MLE = peak value. e.g. MLE = 0.2, L max = for A, and L max = for B etc. Set A: Log L max - Log L=Log(0.0067) - Log( )= 2 gives  95% C.I. so  =(0.035,0.496) corresponding to L= ,  95% C.I. for A. Similarly, manipulating this expression, Likelihood value corresponding to  95% confidence interval given as L = (7.389) -1 L max Note: Usually plot Log-likelihood vs parameter, rather than Likelihood. As sample size increases, C.I. narrower and  symmetric

15 Maximum Likelihood Benefits Strong estimator properties – sufficiency, efficiency, consistency, non-bias etc. as before Good Confidence Intervals Coverage probability realised and intervals meaningful MLE Good estimator of a CI MSE consistent Absence of Bias - does not “stand-alone” – minimum variance important Asymptotically Normal Precise – large sample Inferences valid, ranges realistic