Time Series Gene Expression Prediction using Neural Networks with Hidden Layers Michael R. Smith, Mark Clement, Tony Martinez, and Quinn Snell Brigham Young University Department of Computer Science October 2010
Modeling Problem
Modeling Problem
Previous Modeling Work DNA microarray technology allows for effective and efficient way to measure gene expression Model the gene regulatory network Boolean networks Bayesian networks (dynamic BN) Electrical circuit analysis Differential equations Neural networks Constraint to be interpretable
Common NN Implementation Each node represents a gene Weights represent the effect of one gene on another Positive (activation) Negative (inhibition) Zero (no influence) Perceptron model
NN Model Changes Training recurrent neural network is difficult Backpropagation through time Genetic algorithms Modified the node's function Fuzzy logic Still a perceptron model
Challenges with Modeling a GRN Fundamental Issues Data scarce, noisy and high dimensional No definitive truth Models are constrained to be interpretable Perceptron Issues Chosen because it is interpretable Does not take into higher order correlations Exclusive OR (XOR) problem
Revised Problem-Prediction
Significance of Prediction Determine the goodness of the model With a “good” model Use the model to infer the genetic regulatory network Generate additional data points for use in a simpler model Do experiments in silico rather then in vitro.
Solution Data scarcity Create more data by combining data points Examine using multi-layer perceptron (MLPs—NN with hidden layers) for predicting gene expression levels. MLPs are capable of modeling higher order correlations
Data Combination
Neural Network Models Perceptron— NN without hidden layer Multi-Layer Perceptron— NN with a hidden layer Recurrent Neural Network
DREAM Results
DREAM Results
DREAM Results
SOS Results
SOS Results
Conclusions MLPs (NNs with hidden layers) are better able to model GRNs than NNs without hidden layers Shows that higher order correlations DO exist in modeling GRNs Could be beneficial in generating synthetic data Data combination for training produces smoother gene expression predictions Noise filtering Similar to Elman nets and BPTT
QUESTIONS?