Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis – Part1: The Initial Questions of the AFCS

Similar presentations


Presentation on theme: "Data Analysis – Part1: The Initial Questions of the AFCS"— Presentation transcript:

1 Data Analysis – Part1: The Initial Questions of the AFCS
Madhu Natarajan, Rama Ranganathan AFCS Annual Meeting 2003

2 Data Analysis: The Initial Questions of the AFCS
What are the goals of data analysis right now? Our first general question…how complex is signaling in cells? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om

3 Data Analysis: The Initial Questions of the AFCS
What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands.

4 Data Analysis: The Initial Questions of the AFCS
What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. Quantitative evaluation of the interactions between pairs of ligand responses, and an estimation of total interaction density. The next talk…

5 Data Analysis: The Initial Questions of the AFCS
What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. This experiment is designed to provide a response pattern for a ligand sampled at several points in the signaling network. It may or may not provide much information about specific mechanism.

6 Data Analysis: The Initial Questions of the AFCS
What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Calcium time points cAMP time points . (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. The problems to solve: A way of combining all the multivariate output data into general parameters that represent signaling.

7 Data Analysis: The Initial Questions of the AFCS
What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Calcium time points cAMP time points . (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. Issues here: A way of combining all the multivariate output data into general parameters that represent signaling. A way of collapsing the non-independent outputs…how many independent variables are there in a calcium trace?

8 Data Analysis: The Initial Questions of the AFCS
What are the goals of the analysis? Signaling Network L1 L2 L3 Ln . O1 O2 O3 Om Calcium time points cAMP time points . (1) Quantitative measurement of the similarity (or dissimilarity) of the responses to different ligands. Issues here: A way of combining all the multivariate output data into general parameters that represent signaling. A way of collapsing the non-independent outputs… A formalism for calculating the similarity of ligand responses.

9 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale? If we do that, then how do we create a sensible representation of the complete dataset for each ligand? Signaling Network L1 O1 O2 O3 On .

10 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s basal Observed value

11 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s basal Observed value So, we define a parameter S (for significance or signaling):

12 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s=0.7 1.5 3.8 So for an observed value of 3.8 given a basal value of 1.5 and a standard deviation of 0.7, you get an S value of 3.29.

13 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? One approach is to make an Gaussian error model for the unstimulated value of each variable. Then convert each variable for a ligand into the statistical significance of observing the value given the unstimulated value and error model. s=1.4 1.5 3.8 So for an observed value of 3.8 given a basal value of 1.5 and a standard deviation of 0.7, you get an S value of But if the standard deviation was 1.4, then S is only 1.68

14 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Our observed variable and basal value get transformed into these new units of statistical significance.

15 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Our observed variable and basal value get transformed into these new units of statistical significance. Why do this? Every data element we collect (regardless of type, time scale, method of collection) can now be put on a common basis for comparison, clustering, etc. The only assumption is that the basal value is normally distributed around its mean.

16 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Our observed variable and basal value get transformed into these new units of statistical significance. Why do this? Every data element we collect (regardless of type, time scale, method of collection) can now be put on a common basis for comparison, clustering, etc. Also, provides a suitable measure for talking about the interaction of two ligands…the additivity of S.

17 Quantitative measurement of similarity in ligand screen data
Merging different types of data. How can we put all the experimental variables on a common scale and then create a unified representation of the dataset for each ligand? 3.29 Now what about all experimental variables?

18 Quantitative measurement of similarity in ligand screen data
The Experiment Space A highly multi-dimensional space, but one that behaves just like three-dimensional space. Each variable gets an independent dimension, and so a complete single ligand dataset is one vector in this space.

19 Quantitative measurement of similarity in ligand screen data
The Experiment Space What can we learn from this representation?

20 Quantitative measurement of similarity in ligand screen data
The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector.

21 Quantitative measurement of similarity in ligand screen data
The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning…

22 Quantitative measurement of similarity in ligand screen data
The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning… DS1,2

23 Quantitative measurement of similarity in ligand screen data
The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning…and this preserves the dimensions along which the differences occur. DS1,2

24 Quantitative measurement of similarity in ligand screen data
The Experiment Space What can we learn from this representation? The response profile for each ligand is the final S vector. Differences between ligand responses have a natural meaning…and this preserves the dimensions along which the differences occur. So let’s start constructing the experiment space for the B cell data… DS1,2

25 Converting raw data to S variables:
LPA Fluorescence units Time (sec)

26 Converting raw data to S variables:
LPA LPA Signaling units (S) Fluorescence units Time (sec) Time (sec)

27 Converting raw data to S variables:
LPA LPA Signaling units (S) BLC BLC Fluorescence units Signaling units (S) AIG AIG Signaling units (S) Time (sec) Time (sec)

28 Converting all the raw data for one experiment type to S variables:
600 Time (sec)

29 Converting all the raw data for one experiment type to S variables:
600 Time (sec) BLC S

30 Converting all the raw data for one experiment type to S variables:
600 Time (sec) LPA S

31 Converting all the raw data for one experiment type to S variables:
600 Time (sec) Now, 200 separate variables for the calcium traces is clearly idiotic… LPA S

32 Data reduction…a cluster-based approach:
600 Time (sec)

33 Data reduction…a cluster-based approach:
600 Time (sec) 5 4 3 1 2 2 1 3 4 5

34 The first five dimensions: the reduced calcium response
600 Time (sec) 1 2 3 4 5

35 All the dimensions (minus gene expression):
2.5’ 5.0’ 15’ 30’ 1 2 3 4 5 .5 1 3 8 20

36 Clustering the experiment space:
2.5’ 5.0’ 15’ 30’ 1 2 3 4 5 .5 1 3 8 20

37 Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise.

38 Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses.

39 Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions.

40 Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions. Of the 14 with measurable responses, we discern 8 distinct patterns of response.

41 Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions. Of the 14 with measurable responses, we discern 8 distinct patterns of response. The gene expression dataset will be integrated into the experiment space…as soon as we clearly understand how to identify the gene clusters that should be collapsed into experiment space dimensions.

42 Conclusions: A simple transformation of raw data variables into dimensionless S variables (units of significance) permits construction of an unified experiment space of all data, regardless of source or differences in intrinsic dynamic range and signal to noise. A potentially serious danger is over-parameterization, the usage of many non-independent variables to represent a biological process (say, the inactivation of a calcium response). We suggest that this can be addressed through clustering variables over many ligand responses. 14 out of 32 ligands applied to the B cell showed some significant response in at least one of the 54 experiment space dimensions. Of the 14 with measurable responses, we discern 8 distinct patterns of response. The gene expression dataset will be integrated into the experiment space…as soon as we clearly understand how to identify the gene clusters that should be collapsed into experiment space dimensions. What predictions seem reasonable for the double ligand screen? combinations of ligands that show similar response patterns might be expected to show interaction, combinations of ligands that are very different in response might show less or no interaction.

43 Acknowledgements: Madhu Natarajan Paul Sternweis Elliott Ross Mel Simon Al Gilman


Download ppt "Data Analysis – Part1: The Initial Questions of the AFCS"

Similar presentations


Ads by Google