Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A. Binding and Kinetics for Experimental Biologists Lecture 7 Dealing with uncertainty: Confidence.

Slides:



Advertisements
Similar presentations
Determination of Binding Affinities and Molecular Mechanisms Petr Kuzmič BioKin, Ltd. Part 1: Theory Training Day May 2, 2014 (London)
Advertisements

Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A. Binding and Kinetics for Experimental Biologists Lecture 8 Optimal design of experiments.
Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A. Binding and Kinetics for Experimental Biologists Lecture 1 Numerical Models for Biomolecular.
SE503 Advanced Project Management Dr. Ahmed Sameh, Ph.D. Professor, CS & IS Project Uncertainty Management.
Model calibration using. Pag. 5/3/20152 PEST program.
Parameter Estimation Chapter 8 Homework: 1-7, 9, 10 Focus: when  is known (use z table)
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
WATERTOWN, MASSACHUSETTS, U.S.A.
Binding and Kinetics for Experimental Biologists Lecture 4 Equilibrium Binding: Case Study Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A.
1 Detection and Analysis of Impulse Point Sequences on Correlated Disturbance Phone G. Filaretov, A. Avshalumov Moscow Power Engineering Institute, Moscow.
Chapter 19 Confidence Intervals for Proportions.
Covalent Inhibition Kinetics Application to EGFR Kinase
Parameter Estimation Chapter 8 Homework: 1-7, 9, 10.
Evaluating Hypotheses
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Numerical Enzymology Generalized Treatment of Kinetics & Equilibria Petr Kuzmič, Ph.D. BioKin, Ltd. DYNAFIT SOFTWARE PACKAGE.
Binding and Kinetics for Experimental Biologists Lecture 2 Evolutionary Computing : Initial Estimate Problem Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN,
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets.
Hypothesis Tests and Confidence Intervals in Multiple Regressors
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Overview of Meta-Analytic Data Analysis
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Hypothesis Testing in Linear Regression Analysis
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July
Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Biochemical / Biophysical Kinetics “Made Easy” Software DYNAFIT in drug discovery research Petr Kuzmič, Ph.D. BioKin, Ltd. 1.Theory: differential equation.
WATERTOWN, MASSACHUSETTS, U.S.A.
Analytical vs. Numerical Minimization Each experimental data point, l, has an error, ε l, associated with it ‣ Difference between the experimentally measured.
Advanced Methods in Dose-Response Screening of Enzyme Inhibitors 1. Fitting model: Four-parameter logistic (IC 50 ) vs. Morrison equation (K i *) 2. Robust.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
1 Statistical Distribution Fitting Dr. Jason Merrick.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
ETM 607 – Input Modeling General Idea of Input Modeling Data Collection Identifying Distributions Parameter estimation Goodness of Fit tests Selecting.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Irreversible Inhibition Kinetics1 Automation and Simulation Petr Kuzmič, Ph.D. BioKin, Ltd. 1.Automate the determination of biochemical parameters 2.PK/PD.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Learning Simio Chapter 10 Analyzing Input Data
M Machine Learning F# and Accord.net.
Sampling and estimation Petter Mostad
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Semi-mechanistic modelling in Nonlinear Regression: a case study by Katarina Domijan 1, Murray Jorgensen 2 and Jeff Reid 3 1 AgResearch Ruakura 2 University.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
CHAPTER 29: Multiple Regression*
Basic Practice of Statistics - 3rd Edition Inference for Regression
Area Coverage Problem Optimization by (local) Search
Presentation transcript:

Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A. Binding and Kinetics for Experimental Biologists Lecture 7 Dealing with uncertainty: Confidence intervals I N N O V A T I O N L E C T U R E S (I N N O l E C)

BKEB Lec 7: Confidence Intervals2 "Hunches and intuitive impressions are essential for getting the work started, but it is only through the quality of the numbers at the end that the truth can be told.“ -Lewis Thomas L. Thomas (1977) "Biostatistics in Medicine", Science 198, 675 But how much confidence can you have in that number? Gregor Mendel ( ) Google - July 20, 2011 approximately 3:1 G / y

BKEB Lec 7: Confidence Intervals3 Lecture outline The problem: How much (or how little) can we trust our rate and equilibrium constants? The solution: Always report at least some measure of parameter uncertainty: - formal standard error - confidence interval (a) by systematic search (profile-t method) (b) by stochastic simulations (Monte-Carlo method) An implementation: Software DynaFit. An example: The classic “Biological oxygen demand (BOD)” problem

BKEB Lec 7: Confidence Intervals4 Part 1: Confidence intervals by systematic searching “Profile-t” method

BKEB Lec 7: Confidence Intervals5 Example problem: Biological oxygen demand (B.O.D.) A CLASSIC DATA SET IN STATISTICAL LITERATURE BOD = measure of organic pollution in environmental water Bates D. M. & Watts, D. G. (1988) Nonlinear Regression and its Applications Wiley, New York, p. 270 BOD at t  infinity ?

BKEB Lec 7: Confidence Intervals6 Theoretical model: Exponential growth COMPARE ALGEBRAIC MODEL WITH DYNAFIT NOTATION ALGEBRAIC MODEL:DYNAFIT MODEL: [mechanism] Oxygen ---> Bacteria : k time, days BOD, mg/l BOD max = 19.1 mg/l

BKEB Lec 7: Confidence Intervals7 How much should we trust these model parameters? FIX B max AT AN ARBITRARY VALUE, OPTIMIZE k B max = 19.1 k = 0.53 sum of squares = 26.0 B max = 25.0 k = 0.28 sum of quares = 41.9 B max = 30.0 k = 0.20 sum of squares = 57.6 optimized parameter fixed parameter

BKEB Lec 7: Confidence Intervals8 A little better than nothing: Formal standard errors THIS IS WHAT MOST PAPERS REPORT IN THE LITERATURE BOD max = (19.1 ± 2.5) mg/l implies the interval 19.1 – 2.5 = = 21.6 [settings] {Output} InferenceBands = y

BKEB Lec 7: Confidence Intervals9 The correct way to do it: Approximate confidence intervals VERY RARELY REPORTED IN THE LITERATURE (UNFORTUNATELY) BOD max = (19.1 ± 2.5) [15.0 – 29.3] mg/l log [Oxygen] mean square

BKEB Lec 7: Confidence Intervals10 Confidence intervals: Profile-t method in DynaFit A SEQUENCE OF SEVERAL INDEPENDENT LEAST-SQUARES FITS INPUT: [mechanism] Oxygen ---> Bacteria : k [constants] k = 1 ? [concentrations] Oxygen = 10 ?? ALGORITHM: 1.Perform an initial fit with all parameters optimized 2.Perform a series of follow-up fits focusing on a given parameter 2a. “Freeze” the parameter at values progressively further away from optimal 2b. Optimize all remaining parameters 2c. Repeat (2a) and (2b) until sum of squares reaches a “critical value” above minimum REFERENCE: Bates, D. M., and Watts, D. G. (1988) Nonlinear Regression Analysis and its Applications Wiley, New York, pp log [Oxygen] mean square SSQ min SSQ crit low high

BKEB Lec 7: Confidence Intervals11 Confidence level (%) and the width of confidence intervals HIGHER CONFIDENCE LEVEL = WIDER CONFIDENCE INTERVAL 90% 95% 99% ? Upper limit for BOD max could not be determined at 99% confidence level.

BKEB Lec 7: Confidence Intervals12 Example of a half-open confidence interval UPPER LIMITS FOR BIMOLECULAR ASSOCIATION RATE CONSTANTS OFTEN CANNOT BE DETERMINED Moss, Kuzmic, et al. (1996) Biochemistry 35, MECHANISM CONFIDENCE INTERVAL FOR k 4 k 4 = (5 ± 200) [3 —  ] µM -1 s -1

BKEB Lec 7: Confidence Intervals13 Search for confidence intervals may diagnose “false minima” AN OCCASIONAL SIDE-BENEFIT OF CONFIDENCE INTERVAL SEARCHES initial estimate “false minimum” PARAMETER SUM OF SQUARES CI search global minimum

BKEB Lec 7: Confidence Intervals14 SUMMARY: Confidence intervals via profile-t method Confidence intervals are asymmetrical for all nonlinear parameters Frequently much wider (more realistic) than ± formal standard errors Sometimes half-open intervals: “better than nothing”, e.g. for bimolecular association Can have mechanistic implications (reversible / irreversible steps) Sometimes CI search helps in falling out of false minima In DynaFit scripts, CIs are requested by the “??” syntax Should always be reported with their corresponding confidence levels (%) CIs are wider at higher confidence levels Frequently used confidence levels: 90%, 95%, or 99% Computation can be time consuming (many repeated least-squares fits) [settings] {Marquardt} ConfidenceLevel = 90

BKEB Lec 7: Confidence Intervals15 Part 2: Confidence intervals by stochastic simulations Monte-Carlo method

BKEB Lec 7: Confidence Intervals16 Monte-Carlo confidence intervals: Algorithm 1.Perform an initial fit as usual 2.Perform a large series (> 1000) of follow-up fits 2a. Simulate an artificial data set with random errors superimposed in ideal data 2b. Perform a fit of the artificial data 2c. Compile a histogram of distribution for model parameters from many repeated fits 2d. Determine the range of plausible values for model parameters from the histograms REFERENCE: Straume, M., and Johnson, M. L. (1992) “Monte-Carlo method for determining complete confidence probability distributions of estimated model parameters” Methods Enzymol. 210, 117–129.

BKEB Lec 7: Confidence Intervals17 Monte-Carlo confidence intervals: DynaFit input A SINGLE LINE ADDED TO THE DYNAFIT SCRIPT [task] task = fit data = progress confidence = monte-carlo [mechanism] Oxygen ---> Bacteria : k [constants] k = 1 ?? [concentrations] Oxygen = 10 ??... [settings] {MonteCarlo} Runs = ConcentrationErrorPercent = 0 plus a number of other advanced control parameters

BKEB Lec 7: Confidence Intervals18 Monte-Carlo confidence intervals: DynaFit output HISTOGRAMS OF DISTRIBUTION PLUS CORRELATION PLOTS [Oxygen] k confidence interval conf. interval for k [Oxygen] joint confidence interval Distribution of best-fit values from 1000 least-squares fits of simulated data

BKEB Lec 7: Confidence Intervals19 Monte-Carlo confidence intervals: Convex hull plots CONVEX HULL = SHORTEST PATH COMPLETELY ENCLOSING A GROUP OF POINTS IN A PLANE EPS (PostScript) file generated by DynaFit: solid line: convex hull plot intensity of squares ~ frequency of best-fit values

BKEB Lec 7: Confidence Intervals20 Monte-Carlo and profile-t confidence intervals compared MONTE-CARLO INTERVALS ARE ALMOST ALWAYS NARROWER THAN PROFILE-t AT 90% LEVEL [Oxygen] time, days k B max lowhigh MONTE-CARLO METHOD (n = 1000) PROFILE-t METHOD (90% confidence level) k B max lowhigh good agreement between the two methods

BKEB Lec 7: Confidence Intervals21 Randomly varied concentrations: DynaFit input REAGENT CONCENTRATIONS ARE ALWAYS AFFECTED BY RANDOM TITRATION ERRORS! [mechanism] E + S ES : k ks ES ----> E + P : kr [constants] k = 100 ks = 1000 ? kr = 1 ?... [settings] {MonteCarlo} ConcentrationErrorPercent = 10 [end] Enzyme kinetics: Substrate conversionMechanism: Michaelis-Menten time [product]

BKEB Lec 7: Confidence Intervals22 Randomly varied concentrations: DynaFit output JOINT CONFIDENCE INTERVAL (AS A CONVEX HULL) 10% titration error error-free concentrations

BKEB Lec 7: Confidence Intervals23 SUMMARY: Confidence intervals via Monte-Carlo method Method makes no assumptions about the statistical distribution of model parameter errors Often uncovers “strange” effects such as half-open confidence intervals - mechanistic implications (reversible / irreversible steps) Reveals special patterns in the statistical correlation between model parameters Does not require an arbitrary choice of confidence levels (%) PROS: Method makes heavy assumptions about the statistical distribution of experimental errors - could be overcome by the “shuffling” and “shifting” methods in DynaFit Can take a very long time to compute (multiple hours) Does not help in discovering false minima CONS:

BKEB Lec 7: Confidence Intervals24 Side comment: The issue of significant digits

BKEB Lec 7: Confidence Intervals25 Example of poor reporting: Hyperbolic fit in a student project RESULTS FROM A SEMESTER-LONG RESEARCH PROJECT y = B max x K d + x what is wrong with this result? 1.no measure of uncertainty 2.too many digits

BKEB Lec 7: Confidence Intervals26 Software programs usually report too many digits OUTPUT GENERATED BY SOFTWARE PACKAGE “ORIGIN” y = B max x K d + x B max KdKd K d = ( ± ) nM what is wrong with this result? DIRECT OUTPUT FROM SOFTWARE: SENSIBLE WAY TO REPORT IT: K d = (440 ± 70) nM RECIPE: 1.Round standard error to a single significant digit 2.Round best-fit value to the same number of decimal points

BKEB Lec 7: Confidence Intervals27 Overall summary and conclusions 1.Always report at least some measure of statistical uncertainty for all nonlinear model parameters (rate and equilibrium constants). 2.At the very least report the formal ± standard errors. 3.Confidence intervals are more informative than standard errors. 4.DynaFit offers two different methods for confidence intervals: a. Systematic search (profile-t method) b. Stochastic simulation (Monte-Carlo method) 5.The two methods have their own merits and drawbacks When in doubt, use both. 6.DynaFit is not a “silver bullet”: You must still use your brain a lot. ANY NUMERICAL RESULT REPORTED WITHOUT SOME MEASURE OF UNCERTAINTY IS MEANINGLESS