Data Analysis – Part1: The Initial Questions of the AFCS

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
5 min for questions Alliance for Cellular Signaling Pasadena, CA May 19, 2003Monday Morning Single and Double Ligand Screens 9:00 Rama Ranganathan Single.
The Normal Distribution
Design of Experiments Lecture I
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Dimension reduction (1)
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.
Ligand Screen for cAMP Assays in Primary B Cells and RAW264.7 Cells Keng-Mean Lin, Robert Hsueh, Madhusudan Natarajan, Paul Sternweis Alliance for Cellular.
CHAPTER 5 SIGNAL SPACE ANALYSIS
RAW264.7 Cell Ligand Screen Summary Progress Report and Perspectives AfCS 5/24/04.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Data Analysis – Part 2: The Initial Questions of the AFCS Madhu Natarajan, Rama Ranganathan AFCS Annual Meeting 2003.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Learning Chaotic Dynamics from Time Series Data A Recurrent Support Vector Machine Approach Vinay Varadan.
FORECASTING METHODS OF NON- STATIONARY STOCHASTIC PROCESSES THAT USE EXTERNAL CRITERIA Igor V. Kononenko, Anton N. Repin National Technical University.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Coincidence algorithm and an optimal unification of GW – detectors A.V.Gusev, V.N.Rudenko SAI MSU, Moscow, Russia Moscow University Physics Bulletin, 2009,
STATISICAL ANALYSIS HLIB BIOLOGY TOPIC 1:. Why statistics? __________________ “Statistics refers to methods and rules for organizing and interpreting.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Data statistics and transformation revision Michael J. Watts
Statistics & Evidence-Based Practice
Inference: Conclusion with Confidence
Tests of hypothesis Contents: Tests of significance for small samples
F-tests continued.
Measurement, Quantification and Analysis
Chapter 7. Classification and Prediction
Chapter 3: Maximum-Likelihood Parameter Estimation
PDF, Normal Distribution and Linear Regression
Statistical Data Analysis - Lecture /04/03
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Chow test.
Inference: Conclusion with Confidence
General Linear Model & Classical Inference
General principles in building a predictive model
Chen Jimena Melisa Parodi Menashe Shalom
Statistics 200 Objectives:
Principal Component Analysis (PCA)
Central Tendency and Variability
Machine Learning Basics
Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)
Roberto Battiti, Mauro Brunato
Simple Linear Regression - Introduction
Lecture Slides Elementary Statistics Thirteenth Edition
Tomasz Maszczyk and Włodzisław Duch Department of Informatics,
Hierarchical clustering approaches for high-throughput data
Quantum One.
Sampling Distribution
Lesson Comparing Two Means.
Sampling Distribution
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Why general modeling framework?
The B-Cell Ligand Screen (the Final Report) and Next Steps
Dimension reduction : PCA and Clustering
Artificial Intelligence Lecture No. 28
Psych 231: Research Methods in Psychology
Predicting Gene Expression from Sequence
Topological Signatures For Fast Mobility Analysis
Unfolding with system identification
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Multivariate Analysis - Introduction
Using Clustering to Make Prediction Intervals For Neural Networks
Probabilistic Surrogate Models
Presentation transcript:

Data Analysis – Part1: The Initial Questions of the AFCS Madhu Natarajan, Rama Ranganathan AFCS Annual Meeting 2003

Data Analysis: The Initial Questions of the AFCS What are the goals of data analysis right now? Our first general question…how complex is signaling in cells? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om

Data Analysis: The Initial Questions of the AFCS What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands.

Data Analysis: The Initial Questions of the AFCS What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. Quantitative evaluation of the interactions between pairs of ligand responses, and an estimation of total interaction density. The next talk…

Data Analysis: The Initial Questions of the AFCS What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. This experiment is designed to provide a response pattern for a ligand sampled at several points in the signaling network. It may or may not provide much information about specific mechanism.

Data Analysis: The Initial Questions of the AFCS What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Calcium time points cAMP time points . (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. The problems to solve: A way of combining all the multivariate output data into general parameters that represent signaling.

Data Analysis: The Initial Questions of the AFCS What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Calcium time points cAMP time points . (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. Issues here: A way of combining all the multivariate output data into general parameters that represent signaling. A way of collapsing the non-independent outputs…how many independent variables are there in a calcium trace?

Data Analysis: The Initial Questions of the AFCS What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Calcium time points cAMP time points . (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. Issues here: A way of combining all the multivariate output data into general parameters that represent signaling. A way of collapsing the non-independent outputs… A formalism for calculating the similarity of ligand responses.

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale? If we do that, then how do we create a sensible representation of the complete dataset for each ligand? Signaling Network L1 O1 O2 O3 On .

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s basal Observed value

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s basal Observed value So, we define a parameter S (for significance or signaling):

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s=0.7 1.5 3.8 So for an observed value of 3.8 given a basal value of 1.5 and a standard deviation of 0.7, you get an S value of 3.29.

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s=1.4 1.5 3.8 So for an observed value of 3.8 given a basal value of 1.5 and a standard deviation of 0.7, you get an S value of 3.29. But if the standard deviation was 1.4, then S is only 1.68

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Our observed variable and basal value get transformed into these new units of statistical significance.

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Our observed variable and basal value get transformed into these new units of statistical significance. Why do this? Every data element we collect (regardless of type, time scale, method of collection) can now be put on a common basis for comparison, clustering, etc. The only assumption is that the basal value is normally distributed around its mean.

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Our observed variable and basal value get transformed into these new units of statistical significance. Why do this? Every data element we collect (regardless of type, time scale, method of collection) can now be put on a common basis for comparison, clustering, etc. Also, provides a suitable measure for talking about the interaction of two ligands…the additivity of S.

Quantitative measurement of similarity in ligand screen data Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Now what about all experimental variables?

Quantitative measurement of similarity in ligand screen data The Experiment Space A highly multi-dimensional space, but one that behaves just like three-dimensional space. Each variable gets an independent dimension, and so a complete single ligand dataset is one vector in this space.

Quantitative measurement of similarity in ligand screen data The Experiment Space What can we learn from this representation?

Quantitative measurement of similarity in ligand screen data The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector.

Quantitative measurement of similarity in ligand screen data The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning…

Quantitative measurement of similarity in ligand screen data The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning… DS1,2

Quantitative measurement of similarity in ligand screen data The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning…and this preserves the dimensions along which the differences occur. DS1,2

Quantitative measurement of similarity in ligand screen data The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning…and this preserves the dimensions along which the differences occur. So let’s start constructing the experiment space for the B cell data… DS1,2

Converting raw data to S variables: LPA Fluorescence units Time (sec)

Converting raw data to S variables: LPA LPA Signaling units (S) Fluorescence units Time (sec) Time (sec)

Converting raw data to S variables: LPA LPA Signaling units (S) BLC BLC Fluorescence units Signaling units (S) AIG AIG Signaling units (S) Time (sec) Time (sec)

Converting all the raw data for one experiment type to S variables: 600 Time (sec)

Converting all the raw data for one experiment type to S variables: 600 Time (sec) BLC S

Converting all the raw data for one experiment type to S variables: 600 Time (sec) LPA S

Converting all the raw data for one experiment type to S variables: 600 Time (sec) Now, 200 separate variables for the calcium traces is clearly idiotic… LPA S

Data reduction…a cluster-based approach: 600 Time (sec)

Data reduction…a cluster-based approach: 600 Time (sec) 5 4 3 1 2 2 1 3 4 5

The first five dimensions: the reduced calcium response 600 Time (sec) 1 2 3 4 5

All the dimensions (minus gene expression): 2.5’ 5.0’ 15’ 30’ 1 2 3 4 5 .5 1 3 8 20

Clustering the experiment space: 2.5’ 5.0’ 15’ 30’ 1 2 3 4 5 .5 1 3 8 20

Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise.

Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses.

Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions.

Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions. Of the 14 with measurable responses, we discern 8 distinct patterns of response.

Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions. Of the 14 with measurable responses, we discern 8 distinct patterns of response. The gene expression dataset will be integrated into the experiment space…as soon as we clearly understand how to identify the gene clusters that should be collapsed into experiment space dimensions.

Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions. Of the 14 with measurable responses, we discern 8 distinct patterns of response. The gene expression dataset will be integrated into the experiment space…as soon as we clearly understand how to identify the gene clusters that should be collapsed into experiment space dimensions. What predictions seem reasonable for the double ligand screen? combinations of ligands that show similar response patterns might be expected to show interaction, combinations of ligands that are very different in response might show less or no interaction.

Acknowledgements: Madhu Natarajan Paul Sternweis Elliott Ross Mel Simon Al Gilman