Model selection and fitting

Slides:



Advertisements
Similar presentations
Introduction to parameter optimization
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Model Assessment, Selection and Averaging
Data mining in 1D: curve fitting
x – independent variable (input)
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 6: Multiple Regression
Machine Learning CMPT 726 Simon Fraser University
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Classification and Prediction: Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 CHAPTER M4 Cost Behavior © 2007 Pearson Custom Publishing.
Inference for regression - Simple linear regression
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Analytical vs. Numerical Minimization Each experimental data point, l, has an error, ε l, associated with it ‣ Difference between the experimentally measured.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Physics 114: Exam 2 Review Lectures 11-16
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 8 Analysis of Variance.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
D. Speed selection and length of an experiment Equilibrium Experiments: ● The speed in equilibrium experiments determines the steepness of the equilibrium.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Stats Methods at IC Lecture 3: Regression.
Module II Lecture 1: Multiple Regression
Estimating standard error using bootstrap
Chapter 4: Basic Estimation Techniques
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Overview of probability and statistics
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
Erin M. Adkins, Zachary D. Reed, and Joseph T. Hodges
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Regression Analysis Module 3.
Nanoparticles as Fluorescence Labels: Is Size All that Matters?
Determining How Costs Behave
CJT 765: Structural Equation Modeling
Multiple Regression.
Chapter 11 Simple Regression
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Simple Linear Regression
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
I. Statistical Tests: Why do we use them? What do they involve?
High- and Low-Potency Ligands with Similar Affinities for the TCR
Overfitting and Underfitting
Principles of the Global Positioning System Lecture 11
CHAPTER Five: Collection & Analysis of Rate Data
Seasonal Forecasting Using the Climate Predictability Tool
Samuel T. Hess, Watt W. Webb  Biophysical Journal 
One-Factor Experiments
Volume 90, Issue 6, Pages (March 2006)
Regression and Correlation of Data
Survey Networks Theory, Design and Testing
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

Model selection and fitting 13 May 2019 Local UW resources for help with statistical analysis: Here are two options for on-campus support regarding data analysis, visualization, and data science. https://escience.washington.edu/office-hours/ https://www.stat.washington.edu/consulting/

Outline Background Model selection and assessing fit quality What is curve fitting? How does it work? Model selection and assessing fit quality Goodness of fit parameters Residuals as diagnostics Fitting process and options Constraints Weights Local vs. global fitting Fitting software GraphPad Prism demonstration

What is curve fitting? EC50 1.96 ± 0.21 μM 13.3 ± 1.51 μM Using a mathematical model to approximate an experimental dataset Why bother to fit data? Extract simple parameters from complex datasets Quantitatively compare datasets

How does curve fitting work? Choose some model (equation) and calculate parameter values that allow for best agreement between the data and the model (Minimize the residual sum of squares) 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 −𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑅𝑆𝑆= (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 −𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑) 2 𝑦=𝑚𝑥+𝑏 Parameters to fit

Assessing fit quality Want to minimize differences between data and fit Want to maximize R2 (1 is max) Adjusted R2 more useful if comparing models with different number of parameters (R2 will always increase when more parameters added)

Residuals as fit diagnostics What are desirable features of the residual distribution? Small residual values Symmetrically distributed about zero (no systematic error)

Choosing a model High error, simple model Balance between low error, simplicity Low error, complex model What are the primary considerations when trying to decide between a set of models? Simplest model possible -- fewest number of parameters Lowest error possible -- best agreement with data (Physiological or experimental relevance) https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76

When to favor simplicity 𝑦=𝑎+𝑏𝑥 𝑦=𝑎+𝑏𝑥+𝑐 𝑥 2 𝑦=𝑎+𝑏𝑥+𝑐 𝑥 2 +𝑑 𝑥 3 +𝑒 𝑥 4 Overfitting Using overly complex model with too many floating parameters Fitting noise rather than the experimental phenomenon of interest Relevance of extracted parameters becomes questionable

When to favor a more complex model Free analyte Immobilized ligand One-to-one model Bivalent analyte model https://www.sprpages.nl/data-fitting/models

When to favor a more complex model One-to-one model Bivalent analyte model χ2 = 4.17 χ2 = 0.36 Can experiment be re-designed to allow for simpler model?  Immobilize the antibody instead of the antigen

Constraining and fixing parameters Fit parameters can be fixed to a known value or allowed to ‘float’ (with or without constraints) Parameter constraints Bounds for a parameter set prior to fitting Based on mathematical or experimental limits Examples? Fixed parameters Value known independently from other experiments Fixing a parameter can increase confidence in fitted parameters EC50 and KD > 0 https://www.wavemetrics.com/products/igorpro/dataanalysis/curvefitting/constraints

Weighting datapoints differently Point has high error; Weight it less in fit Weighting can be used to emphasize those datapoints with less relative error Common weighting methods: Weight points by 1/Y2: When error is proportional to signal Weight points by 1/SD2: When some points contain higher error With multiple replicates, it is usually best to consider each replicate as a separate point (rather than fitting average and weighting by SD)

Local and global fitting When fitting multiple datasets to the same model, some parameters can be globally fit (shared between datasets) e.g. binding kinetics with different concentrations of ligand Advantages of global fitting Increased confidence in globally fit parameters Parameter Global value koff (s-1) 0.0784 kon (M-1s-1) 649000 Bmax (mAU) 101.2

Examples of fitting software Prism: intuitive, many built-in functions MATLAB, Mathematica: good for complex, custom models R: statistical emphasis

Summary Curve fitting allows for extraction of experimental parameters from datasets and facilitates data comparison Curve fitting algorithms work by minimizing residuals Goodness of fit can be assessed numerically using statistics and graphically using residual plots Model selection should balance simplicity, error minimization, and experimental relevance Appropriate constraints and weighting promote good fits Global fitting increases confidence in shared parameters

Demonstration: fitting FCS data Fluorescence correlation spectroscopy Monitor diffusion of fluorescently labeled particle as it moves across focal volume of confocal microscope Most interested in the diffusion time (td) parameter, which is a measure of hydrodynamic radius 3-dimensional diffusion model: 𝐺 τ = 1 𝑁 1 1+ τ 𝑡𝑑 1 1+ 𝑠 2 τ 𝑡𝑑 0.5 N: average number of particles in focal volume td: diffusion (residence) time s: ratio of radial to axial dimensions Independently known – fix the known value

Free dye contamination In the data, we are observing diffusion of labeled protein as well as diffusion of contaminating free dye Two-component model Alternative to more complex model: Better sample cleanup Observable species: + 𝐺 τ = 1 𝑁1 1 1+ τ 𝑡𝑑1 1 1+ 𝑠 2 τ 𝑡𝑑1 0.5 + 1 𝑁2 1 1+ τ 𝑡𝑑2 1 1+ 𝑠 2 τ 𝑡𝑑2 0.5 Now 5 parameters: N1, N2, td1, td2, s

Initial values (‘first guesses’) For floating parameters, an initial guess can be used to speed up the fit or increase chances of a successful fit More important for complex models with many parameters For a robust fit, the parameters should converge to the same values regardless of the initial values chosen