Lecture 2: Parameter Estimation and Evaluation of Support.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Monte Carlo Methods and Statistical Physics
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Lecture 2: Parameter Estimation and Evaluation of Support.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Model calibration using. Pag. 5/3/20152 PEST program.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
EARS1160 – Numerical Methods notes by G. Houseman
DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Models with Discrete Dependent Variables
Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 6: Interpreting Regression Results Logarithms (Chapter 4.5) Standard Errors (Chapter.
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
x – independent variable (input)
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Lecture 5: Learning models using EM
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Parametric Inference.
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Nonlinear Stochastic Programming by the Monte-Carlo method Lecture 4 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.
Efficient Model Selection for Support Vector Machines
CORRELATION & REGRESSION
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
The Triangle of Statistical Inference: Likelihoood
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
1 Advances in the Construction of Efficient Stated Choice Experimental Designs John Rose 1 Michiel Bliemer 1,2 1 The University of Sydney, Australia 2.
Simulated Annealing.
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Mathematical Models & Optimization?
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Maximum Likelihood Estimation Psych DeShon.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Local Search and Optimization Presented by Collin Kanaley.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Optimization Problems
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
WCRP Extremes Workshop Sept 2010 Detecting human influence on extreme daily temperature at regional scales Photo: F. Zwiers (Long-tailed Jaeger)
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Lecture 2: Parameter Estimation and Evaluation of Support
Deep Feedforward Networks
Statistical Methods For Engineers
Stochastic Methods.
Presentation transcript:

Lecture 2: Parameter Estimation and Evaluation of Support

Parameter Estimation “The problem of estimation is of more central importance, (than hypothesis testing).. for in almost all situations we know that the effect whose significance we are measuring is perfectly real, however small; what is at issue is its magnitude.” (Edwards, 1992, pg. 2) “An insignificant result, far from telling us that the effect is non- existent, merely warns us that the sample was not large enough to reveal it.” (Edwards, 1992, pg. 2)

Parameter Estimation l Finding Maximum Likelihood Estimates (MLEs) - Local optimization ( optim ) »Gradient methods »Simplex (Nelder-Mead) - Global optimization »Simulated Annealing ( annea l) »Genetic Algorithms ( rgenoud ) l Evaluating the strength of evidence (“support”) for different parameter estimates - Support Intervals »Asymptotic Support Intervals »Simultaneous Support Intervals - The shape of likelihood surfaces around MLEs

Parameter estimation: finding peaks on likelihood “surfaces”... The variation in likelihood for any given set of parameter values defines a likelihood “surface”... The goal of parameter estimation is to find the peak of the likelihood surface.... (optimization)

Local vs Global Optimization l “Fast” local optimization methods - Large family of methods, widely used for nonlinear regression in commercial software packages l “Brute force” global optimization methods - Grid search - Genetic algorithms - Simulated annealing local optimum global optimum

Local Optimization – Gradient Methods l Derivative-based (Newton-Raphson) methods: Likelihood surface General approach: Vary parameter estimate systematically and search for zero slope in the first derivative of the likelihood function...(using numerical methods to estimate the derivative, and checking the second derivative to make sure it is a maximum, not a minimum)

Local Optimization – No Gradient l The Simplex (Nelder Mead) method - Much simpler to program - Does not require calculation or estimation of a derivative - No general theoretical proof that it works, (but lots of happy practitioners…) Implemented as method= “Nelder-Mead” in the “optim” function in R

Global Optimization – Grid Searches l Simplest form of optimization (and rarely used in practice) - Systematically search parameter space at a grid of points l Can be useful for visualization of the broad features of a likelihood surface

Global Optimization – Genetic Algorithms l Based on a fairly literal analogy with evolution - Start with a reasonably large “population” of parameter sets - Calculate the “fitness” (likelihood) of each individual set of parameters - Create the next generation of parameter sets based on the fitness of the “parents”, and various rules for recombination of subsets of parameters (genes) - Let the population evolve until fitness reaches a maximum asymptote Implemented in the “genoud” package in R: cool but slow for large datasets with large number of parameters

Global optimization - Simulated Annealing l Analogy with the physical process of annealing: - Start the process at a high “temperature” - Gradually reduce the temperature according to an annealing schedule l Always accept uphill moves (i.e. an increase in likelihood) l Accept downhill moves according to the Metropolis algorithm: p = probability of accepting downhill move  lh = magnitude of change in likelihood t = temperature

Effect of temperature (t)

Simulated Annealing in practice... REFERENCES: Goffe, W. L., G. D. Ferrier, and J. Rogers Global optimization of statistical functions with simulated annealing. Journal of Econometrics 60: Corana et al Minimizing multimodal functions of continuous variables with the simulated annealing algorithm. ACM Transactions on Mathematical Software 13: A version with automatic adjustment of range... Lower boundUpper bound Current value Search range (step size)

Effect of C on Adjusting Range...

Constraints – setting limits for the search... l Biological limits - Values that make no sense biologically (be careful...) l Algebraic limits - Values for which the model is undefined (i.e. dividing by zero...) Bottom line: global optimization methods let you cast your net widely, at the cost of computer time...

Simulated Annealing - Initialization l Set - Annealing schedule »Initial temperature (t) (3.0) »Rate of reduction in temperature (rt) (0.95)N »Interval between drops in temperature (nt) (100) »Interval between changes in range (ns) (20) - Parameter values »Initial values (x) »Upper and lower bounds (lb,ub) »Initial range (vm) Typical values in blue...

How many iterations?... Red maple leaf litterfall (6 parameters) 500,000 is way more than necessary! Logistic regression of windthrow susceptibility (188 parameters) 5 million is not enough! What would constitute convergence?...

Optimization - Summary l No hard and fast rules for any optimization – be willing to explore alternate options. l Be wary of initial values used in local optimization when the model is at all complicated l How about a hybrid approach? Start with simulated annealing, then switch to a local optimization…

Evaluating the strength of evidence for the MLE Now that you have an MLE, how should you evaluate it? (Hint: think about the shape of the likelihood function, not just the MLE)

Strength of evidence for particular parameter estimates – “Support” l Likelihood provides an objective measure of the strength of evidence for different parameter estimates... Log-likelihood = “Support” (Edwards 1992)

Profile Likelihood l Evaluate support (information) for a range of values of a given parameter by treating all other parameters as “nuisance” and holding them at their MLEs… Parameter 1 Parameter 2

Asymptotic vs. Simultaneous M-Unit Support Limits l Asymptotic Support Limits (based on Profile Likelihood): - Hold all other parameters at their MLE values, and systematically vary the remaining parameter until likelihood declines by a chosen amount (m)... What should “m” be? (2 is a good number, and is roughly analogous to a 95% CI)

Asymptotic vs. Simultaneous M-Unit Support Limits l Simultaneous: - Resampling method: draw a very large number of random sets of parameters and calculate log- likelihood. M-unit simultaneous support limits for parameter x i are the upper and lower limits that don’t differ by more than m units of support... In practice, it can require an enormous number of iterations to do this if there are more than a few parameters

Asymptotic vs. Simultaneous Support Limits Parameter 1 Parameter 2 2-unit drop in support A hypothetical likelihood surface for 2 parameters... Asymptotic 2-unit support limits for P1 Simultaneous 2-unit support limits for P1

Other measures of strength of evidence for different parameter estimates l Edwards (1992; Chapter 5) - Various measures of the “shape” of the likelihood surface in the vicinity of the MLE... How pointed is the peak?...

Evaluating Support for Parameter Estimates: A Frequentist Approach l Traditional confidence intervals and standard errors of the parameter estimates can be generated from the Hessian matrix - Hessian = matrix of second partial derivatives of the likelihood function with respect to parameters, evaluated at the maximum likelihood estimates - Also called the “Information Matrix” by Fisher - Provides a measure of the steepness of the likelihood surface in the region of the optimum - Can be generated in R using optim and fdHess

Example from R The Hessian matrix (when maximizing a log likelihood) is a numerical approximation for Fisher's Information Matrix (i.e. the matrix of second partial derivatives of the likelihood function), evaluated at the point of the maximum likelihood estimates. Thus, it's a measure of the steepness of the drop in the likelihood surface as you move away from the MLE. > res$hessian a b sd a b sd (sample output from an analysis that estimates two parameters and a variance term)

More from R now invert (“solve” in R parlance) the negative of the Hessian matrix to get the matrix of parameter variance and covariance > solve(-1*res$hessian) a b sd a e e e-06 b e e e-07 sd e e e-03 the square roots of the diagonals of the inverted negative Hessian are the standard errors* > sqrt(diag(solve(-1*res$hessian))) a b sd (*and 1.96 * S.E. is a 95% C.I….)