Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.

Slides:



Advertisements
Similar presentations
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Advertisements

A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Overview Full Bayesian Learning MAP learning
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Motif Finding. Regulation of Genes Gene Regulatory Element RNA polymerase (Protein) Transcription Factor (Protein) DNA.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
Lecture 5: Learning models using EM
Transcription factor binding motifs (part I) 10/17/07.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
(Regulatory-) Motif Finding. Clustering of Genes Find binding sites responsible for common expression patterns.
6. Gene Regulatory Networks
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
Motif Discovery: Algorithm and Application Dan Scanfeld Hong Xue Sumeet Gupta Varun Aggarwal.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
國立陽明大學生資學程 陳虹瑋. Genetic Algorithm Background Fitness function ……. population selection Cross over mutation Fitness values Random cross over.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Expectation Maximization and Gibbs Sampling – Algorithms for Computational Biology Lecture 1- Introduction Lecture 2- Hashing and BLAST Lecture 3-
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
Analysis of the yeast transcriptional regulatory network.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
The Lac Operon An operon is a length of DNA, made up of structural genes and control sites. The structural genes code for proteins, such as enzymes.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Parsing A Bacterial Genome Mark Craven Department of Biostatistics & Medical Informatics University of Wisconsin U.S.A.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Gene, MicroArray and GAs Ashish Anand Kanpur Genetic Algorithms Laboratory (KanGAL) IIT Kanpur.
Flat clustering approaches
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Module Networks BMI/CS 576 Mark Craven December 2007.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
C u e r n a v a c a C u e r n a v a c a RegulonDB: Curation, Literature Search, Notation and Evidences about Transcriptional Regulation and Transcription.
Transcription factor binding motifs (part II) 10/22/07.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Multi-task learning approaches to modeling context-specific networks
Learning Sequence Motif Models Using Expectation Maximization (EM)
Inferring Models of cis-Regulatory Modules using Information Theory
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Schedule for the Afternoon
Theodore R. Rieger, Richard I. Morimoto, Vassily Hatzimanikatis 
Deep Learning in Bioinformatics
Clustering (2) & EM algorithm
Presentation transcript:

Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory Network Models that Represent Regulator States and Roles. To appear in Lecture Notes in Bioinformatics.

Task Given: –Gene expression data –Other sources of data e.g. sequence data, transcription factor binding sites, transcription unit predictions Do: –Construct a model that captures regulatory interactions in a cell

Effector Key Ideas: States and Roles Cellular Condition Regulator Expression Regulatee Expression Regulatee Expression Regulator State Regulator states –Cannot be observed –Depend on more than regulator expression –We use cellular conditions as surrogates/predictors of regulation effectors Regulator roles –Is a regulator an activator or a repressor? –We use sequence analysis to predict these roles

Network Variables and Structure Hidden Regulator States: “activated” or “inactivated” Cellular Conditions: “stationary growth phase”, “heat shock”,... Regulatees: expression states represented as a mixture of Gaussians Regulators: expression states represented as a mixture of Gaussians Connect where we have evidence of regulation Select relevant parents

Network Parameters: Hidden Nodes use CPD-Trees Growth Medium Heat Shock metJ state Growth Phase = Log Phase Growth Phase Growth Phase metJ Parents selected from regulator expression, cellular conditions May contain context-sensitive independence metJ = Low expressionmetJ ≠ Low expression Growth Phase ≠ Log P(metJ state = activated): P(metJ state = activated): 0.994P(metJ state = activated): 0.004

Initializing Roles metA transcription unit Transcription Start Site* -35 UpstreamDownstream DNA metR state metJ state metA metJ state P(Low) P(High) activated activated activated inactivated inactivated activated Inactivated inactivated metR state CPT for regulatee metA Binding sites (metR binds upstream; considered an activator) (metJ binds downstream; considered a repressor) *Predicted transcription start sites from Bockhorst et. al., ISMB ‘03

Training the Model Initialize the parameters –Activators tend to bind more upstream than repressors Use an EM algorithm to set parameters –E-Step: Determine expected states of regulators –M-Step: Update CPDs Repeat until convergence

Experimental Data and Procedure Expression measurements from Affymetrix microarrays (Fred Blattner’s lab, University of Wisconsin-Madison) Regulator binding site predictions from TRANSFAC, EcoCyc, cross-species comparison (McCue, et. al., Genome Research 12, 2002) Experimental data consists of: –90 Experiments –6 Cellular condition variables (between two and seven values) –296 regulatees –64 regulators Cross-fold validation –Microarrays held aside for testing –Conditions from test microarrays do not appear in training set

Log Likelihood Average Squared Error Classification Error Model -12, % Our Model (3 iterations of adding missing TFs) -12, % Baseline #2 (No hidden nodes, using cellular conditions) -13, % Baseline #1 (No hidden nodes, no cellular conditions) -11, % Random Initialization (3 iterations of adding missing TFs)