Model Selection in Parameterizing Cell Images and Populations

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
What is Statistical Modeling
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
PATTERN RECOGNITION AND MACHINE LEARNING
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Enhanced Correspondence and Statistics for Structural Shape Analysis: Current Research Martin Styner Department of Computer Science and Psychiatry.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
CS479/679 Pattern Recognition Dr. George Bebis
The Maximum Likelihood Method
Hierarchical Models.
Data Transformation: Normalization
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Chapter 3: Maximum-Likelihood Parameter Estimation
Deep Feedforward Networks
Fitting: Voting and the Hough Transform
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
Model Inference and Averaging
The Maximum Likelihood Method
CJT 765: Structural Equation Modeling
Sample Mean Distributions
Principal Component Analysis (PCA)
Clustering (3) Center-based algorithms Fuzzy k-means
Overview of Supervised Learning
Pattern Classification, Chapter 3
Latent Variables, Mixture Models and EM
Simple Linear Regression - Introduction
Roberto Battiti, Mauro Brunato
Presenter: Hajar Emami
The Maximum Likelihood Method
Fitting Curve Models to Edges
CSCI 5822 Probabilistic Models of Human and Machine Learning
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Two-Variable Regression Model: The Problem of Estimation
Modelling data and curve fitting
Probabilistic Models with Latent Variables
Filtering and State Estimation: Basic Concepts
10701 / Machine Learning Today: - Cross validation,
Linear Model Selection and regularization
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Simple Linear Regression
Model Evaluation and Selection
OVERVIEW OF LINEAR MODELS
Anastasia Baryshnikova  Cell Systems 
Morphological Operators
LECTURE 09: DISCRIMINANT ANALYSIS
Andreas Hilfinger, Thomas M. Norman, Johan Paulsson  Cell Systems 
Model generalization Brief summary of methods
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Biointelligence Laboratory, Seoul National University
Chapter 14.1 Goodness of Fit Test.
EM Algorithm and its Applications
Reuben Feinman Research advised by Brenden Lake
Tools for automatically combining biochemical and cell organization models Devin Sullivan.
Probabilistic Surrogate Models
Presentation transcript:

Model Selection in Parameterizing Cell Images and Populations MMBIOS, April 2015 Gregory R. Johnson

Object pos. probability Microtubule distribution Nuclear shape Cell shape Object pos. probability Object number Object appearance Microtubule distribution Object positions Object distribution CellOrganizer Training Synthesis Cell Images Synthetic Model Parameters This slide illustrates the central concept of our work in generative modeling. Here we construct models for cell components learned for many cell instances and combine them into a statistical model such that we can sample from that model to obtain new parameter values that we use to synthesize new cell instances

CellOrganizer Models Cell Populations Learn how spatial relationships of cell compartments vary across cell populations Generate high-quality in silico representations (i.e. images) cell shape and the relationships of compartments within them Images Parameterizations X1 p1 Sampled Parameterizations Synthesized Images X2 p2 p1* x1* P(pi|Ɵ) X3 p3 p2* x2* … … X4 p4 pm* xm* Cell Morphology Distribution … … Xn pn f(x) = p d({p1,…,pn}) = Ɵ b(Ɵ) = p* g(p) = x

CellOrganizer Models Cell Populations Represent cell morphology and organization of components in an invertable, compact manner Learn a distribution over these compact parameterizations X1 p1 Sampled Parameterizations Synthesized Images X2 p2 p1* x1* P(pi|Ɵ) X3 p3 p2* x2* … … X4 p4 pm* xm* Cell Morphology Distribution … … Xn pn f(x) = p d({p1,…,pn}) = Ɵ b(Ɵ) = p* g(p) = x

Image To Parameterization Images Parameterizations X1 p1 Represent cell morphology in a compact set of parameters We also desire an invertible function such that we can recover the original image pi,2 pi,3 xi [ , , ] pi,1 cell nucleus protein pattern f(xi) = pi ⟺ g(pi) = xi, i.,e. p1 x1

Image parameterization is lossy Full covariance matrix Gaussian fit Spherical covariance matrix Gaussian fit LAMP2 Protein Pattern GMM parameters ----- Meeting Notes (4/20/15 14:19) ----- Compact parameterizaton Can be lossy add gmm parameters to f(x) = p_i line or pick k based on aic or bic Represent the mixture from parameters Image parameterizations vs number of parameters Becomes Likelihood Maximization problem if K is known

Shape Space Modeling Pipeline MDS 0.85 0.63 0.74 0.90 a. b. c. d.

Image parameterization is lossy (contd.) x1 x2 x3 x4 g(p1) g(p2) g(p3) g(p4) Where ----- Meeting Notes (4/20/15 14:19) ----- By whatever criterion you choose the model, it may be imperfect Fig 2 from T. Peng et al, “Instance-based generative biological shape modeling” 2009.

Multidimensional Scaling = measured distance between shapes i, j = Euclidian embeddings for all shapes = Euclidean distance between embedding coordinates for shapes i, j = Indicator for if Di,j is observed

Shape space dimensionality vs Reconstruction Reconstruction is dependent on the number of observed distances and the dimensionality of the embedding blue = 1 dimensional embedding red = “complete” embedding

Prediction of cell and nuclear dependency

The “goodness” of a cell parameterization Many ways to do this Pixel-pixel Mean Squared Sørensen-Dice Coefficient for binary images and shapes Likelihood function…

Parameters to distribution P(pi|Ɵ) Parameters to distribution … p* pn d({p1,…,pn}) = Ɵ b(p|Ɵ) = p*

Parameters to distribution P(pi|Ɵ) Parameters to distribution p* … pn d({p1,…,pn}) = Ɵ b(p|Ɵ) = p* “Straight forward” distribution learning and model selection Some parameterization may overfit (i.e. point-mass) Many models can not be learned via closed-form solutions Predictive Maximum Likelihood i.e. where n is the number of hold outs xn is some hold-out subset and Ɵn is corresponding trained model

Distributions of object position HIP1 ACBD5 SEC23B

Possible Models Puncta are dependent on organelles, but independent of each other Poisson process Puncta are dependent on organelles and each other Fiskel point process

Five-fold cross validation to choose the best model Model with no puncta-puncta spatial interaction indicates greater likelihood!

Toward Spatial Network Models Colocalization is a complex network with interdependencies Simplify it by use one-direction dependencies (network -> DAG) dprot dcell dnuc pprot nprot sprot iprot Protein N A spatial network exhibiting negative colocalization a) b) c) Fig 1. Representative image of segmented Arabidopsis plant protoplast. a) False colored image with green indicating auto fluorescent chloroplast channel and red indicating endoplasmic reticulum. b) Auto fluorescent chloroplast channel. c) ER channel. Notice the high degree of negative colocalization. Fig 2. DAG of spatial interaction network, N is the number of protein patterns A diagram of a simplified spatial interaction network

Pattern Modeling contd. Generative Models Add parameters to account for spatial dependency of arbitrary numbers of protein patterns P(Chloroplast | Cell) P( ER | Cell) 3D rendering of a protoplast P(Chloroplast | Cell) P(ER | Cell, Chloroplast)

Big Picture… Want most precise cell parameterization f(x) = p, g(p) = x Best-generalizing distribution d({p1,…,pn}) = Ɵ Images Parameterizations X1 p1 Sampled Parameterizations Synthesized Images X2 p2 p1* x1* P(pi|Ɵ) X3 p3 p2* x2* … … X4 p4 pm* xm* Cell Morphology Distribution … … Xn pn f(x) = p d({p1,…,pn}) = Ɵ b(Ɵ) = p* g(p) = x

Master Modeling function How to build a master model-selection model g(pi) with least error between xi and g(pi) d({p1,…,pn}) = Ɵ with greatest likelihood Even if errtot is some sort of proabilistic model, it is not clear how to balance errtot and likelihood of the model ESPECIALLY BECAUSE G(X) DRASTICTLY CHANGES VALUES OF Ɵ Spatial relationship model