A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques, Université de Montréal October 2003

DNA Microarrays http://www.sri.com/pharmdisc/cancer_biology/laderoute.html Experiment design Noise reduction Normalization … Data analysis

Gene Expression Data

Beyond Clustering… Time series x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _ ? Gene network

Comparative Framework Specie CSpecie BSpecie A

Harder Problem? This new problem seems more ambitious and harder to solve. BUT, we will show that, for closely related species (samples), the comparative framework can actually improve the quality of the solutions recovered. The repetitive nature of the data can be used to sort through some of the noise and some of the ambiguity.

Outline Gene network model Single network inference –Algorithm –Simulations Multiple networks inference –Algorithm –Simulations Conclusions

Gene Network Model We use linear differential equations to model the gene trajectories (Chen et al. ‘99, D’haeseleer et al. ‘99) : dx i (t) / dt = a 0 + a i,1 x 1 (t)+ a i,2 x 2 (t)+ … + a i,n x n (t) Several reasons for that choice: –Takes advantage of the continuous aspect of the data. –Allows for feed-back loops. –Low number of parameters implies that we are less likely to over fit the data. –Sufficient to model complex interactions between genes.

Small Network Example dx 1 (t) / dt = 0.491 - 0.248 x 1 (t) dx 2 (t) / dt = -0.473 x 3 (t) + 0.374 x 4 (t) dx 3 (t) / dt = -0.427 + 0.376 x 1 (t) - 0.241 x 3 (t) dx 4 (t) / dt = 0.435 x 1 (t) - 0.315 x 3 (t) - 0.437 x 4 (t) x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _

Small Network Example dx 1 (t) / dt = 0.491 - 0.248 x 1 (t) dx 2 (t) / dt = -0.473 x 3 (t) + 0.374 x 4 (t) dx 3 (t) / dt = -0.427 + 0.376 x 1 (t) - 0.241 x 3 (t) dx 4 (t) / dt = 0.435 x 1 (t) - 0.315 x 3 (t) - 0.437 x 4 (t) x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _ interaction coefficient

Small Network Example dx 1 (t) / dt = 0.491 - 0.248 x 1 (t) dx 2 (t) / dt = -0.473 x 3 (t) + 0.374 x 4 (t) dx 3 (t) / dt = -0.427 + 0.376 x 1 (t) - 0.241 x 3 (t) dx 4 (t) / dt = 0.435 x 1 (t) - 0.315 x 3 (t) - 0.437 x 4 (t) x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _ constant coefficient

Problem Revisited a 0,i a 1,i a 2,i a 3,i a 4,i x1x1.431-.248000 x2x2 000-.473.374 x3x3 -.427.3760-.2410 x4x4 0.4350-.315-.437 Given the time-series data, can we find the interactions coefficients?

Linear Differential Equations Even under the simplest linear model, there are m(m+1) unknown parameters to estimate: m(m-1) directional effects m self effects m constant effects Number of data points is mn and we typically have that n << m (few time-points). To avoid over fitting, extra constraints must be incorporated into the model such as: Smoothness of the equations (D’haeseleer et al. ‘99) Sparseness of the network, i.e. few non-null interaction coefficients (Yeung et al. ‘02, De Hoon et al. ‘02)

Algorithm for Network Inference To recover the interaction coefficients, we use stepwise multiple linear regression. Why? –This procedure finds coefficient that significantly improve the fit in the regression. It limits the number of non-zero coefficients (i.e. it finds sparse networks) a feature we were seeking. –It is highly flexible and provides p-value scores which can be interpreted easily.

Partial F Test The procedure finds the interaction coefficients iteratively for each gene x i. A partial F test is constructed to compare the total square error of the predicted gene trajectory with a specific subset of coefficients being added or removed. If the p-value obtained from the test exceeds a certain cutoff, the subset of coefficients is significant and will be added or removed. The procedures iterates until no more subsets of coefficients are either added or removed.

Simulations Difficult to find coefficients that will produce realistic gene trajectories. We select coefficients such that the resulting trajectories satisfy 3 conditions: –They are bounded –The correlation of any pair is not too high –They are not too stable We added gaussian noise to model errors.

Gaussian Noise

regression procedure Network Inference a 0,i a 1,i a 2,i a 3,i a 4,i x1x1.431-.248000 x2x2 000-.473.374 x3x3 -.427.3760-.2410 x4x4 0.4350-.315-.437 Procedure recovers perfectly this network with 4 genes and 10 interactions coefficients. x2x2 x1x1 x4x4 x3x3 _ _ + +_ _ + _

10 Genes Procedure also recovers perfectly this network with 10 genes and 22 interactions coefficients.

Multiple Networks Specie CSpecie BSpecie A

Types of Problems Multiple networks related by a graph or a tree can arise from various situations: –Different species –Different developments stages –Different tissues The goal is now not only to maximize the fit (with as few interactions as possible) but also to minimize an evolutionary cost on the graph of the networks.

Evolutionary Cost {1, 2} {1, 2, 3} {1}{1, 3}{1, 2, 3} sets of predicted regulators evolutionary event Evolutionary cost = 3

Multiple Network Inference The stepwise regression algorithm is modified to add/remove subsets of regulators directly on the edges of the graph. Partial F tests are computed on the vertices affected by this change the evaluate the change in fit. The p-values obtained are then modified based on the change in evolutionary cost. The p-values are finally combined into a scoring function using a Kolmogorov-Smirnov Test. The algorithm iteratively adds/removes the best scoring move when above/below a certain threshold.

Simulation Example

Simulation Results

Conclusions The comparative framework actually simplifies the inference process especially for instances of the problem with more genes, more noise or fewer time- points. The procedure could also be used for the revision of gene networks. Possibility of exploring different evolutionary models. We need to try the procedure on real data.

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

Similar presentations

Presentation on theme: "A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

Similar presentations

Presentation on theme: "A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,"— Presentation transcript:

Similar presentations

About project

Feedback