1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, 185-198 (16 April 2004)

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Periodic clusters. Non periodic clusters That was only the beginning…
Control of Gene Expression
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Regulatory element detection using correlation with expression (REDUCE) Literature search WANG Chao Sept 14, 2004.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
Finding Regulatory Motifs in DNA Sequences
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)
Molecular genetics of gene expression Mat Halter and Neal Stewart 2014.
Day 2! Chapter 15 Eukaryotic Gene Regulation Almost all the cells in an organism are genetically identical. Differences between cell types result from.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Shine-Dalgarno Motif Ribosome binding site located about 13 bases upstream of AUG start codon SD sequence is: 5’-AGGAGGU-3’ Middle GGAG is more highly.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.
Regulation of Gene Expression: An Overview  Transcriptional  Tissue-specific transcription factors  Direct binding of hormones, growth factors, etc.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Controlling the genes Lecture 15 pp Gene Expression Nearly all human cells have a nucleus (not red blood cells) Almost all these nucleated cells.
(c) The McGraw-Hill Companies, Inc.
Grupo 5. 5’site 3’site branchpoint site exon 1 intron 1 exon 2 intron 2 AG/GT CAG/NT.
Copyright  2005 McGraw-Hill Australia Pty Ltd PPTs t/a Biology: An Australian focus 3e by Knox, Ladiges, Evans and Saint 11-1 Chapter 11: Gene expression.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Finding Transcription Factor Motifs Adapted from a lab created by Prof Terry Speed.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Eukaryotic Gene Expression. The expression of genes found in DNA The expression of genes found in DNA The genes expressed in a particular cell determines.
Introduction to Molecular Cell Biology Transcription Regulation Dr. Fridoon Jawad Ahmad HEC Foreign Professor King Edward Medical University Visiting Professor.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Finding genes in the genome
TATA box Promoter-proximal elements Effects of mutations in promoter element sequences on transcription.
Transcription factor binding motifs (part II) 10/22/07.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Quest for epigenetic determinants of local coexpression clusters Wieslawa Mentzen Labrador and Corces, 2002.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Regulation of Gene Expression
Regulation of Gene Expression by Eukaryotes
Molecular Mechanisms of Gene Regulation
Chapter 12.5 Gene Regulation.
Recitation 7 2/4/09 PSSMs+Gene finding
A Zero-Knowledge Based Introduction to Biology
Nora Pierstorff Dept. of Genetics University of Cologne
Predicting Gene Expression from Sequence
Eukaryotic Gene Regulation
Presentation transcript:

1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)

The Authors Saeed Tavazoie (middle) Professor Dept. of Molecular Biology Mike Beer Postdoctoral Researcher Ph.D, Princeton (1995)

The Question Transcription factor binding sites are relatively well-characterized in Saccharomyces cerevisiae But - the presence of a TF binding site alone is not sufficient to predict expression of a gene Multiple regulatory factors are often involved How do you identify the elaborate rules for gene regulation?

Simple regulatory structures Each possible combination of TFs must be tested in the lab; This is a hugely time-consuming task..

Problems with predicting gene regulation Numerous transcription factors can bind to any one motif Regulatory motif sequences have low consensus e.g. The well known “TATA box” has a consensus of TATA(A/T)A(A/T)(A/G) Many genes have multiple known motifs upstream of ATG

Example of cis-regulatory logic From Yuh et al (1998), Science 279,

The Approach 1. Using microarray expression data, the authors built clusters of genes with similar expression patterns. From brain expression data in Wen et al (1998), PNAS 95,

The Approach, con’t. 2. From groups of genes with similar expression patterns, a search is undertaken for consensus sequence motifs within 800bp upstream of ATG in each cluster.

The Approach, con’t 3. The authors built a Markov model using the TF sequence motifs as parent nodes, and the expression data as data values. 4.This can be applied to a gene of interest by identifying the upstream TF motifs for that gene, and finding the model(s) that best fits the known upstream TF motifs. 5.If the expression data is within the parameters predicted by the model, then there is a decent chance that its associated gene regulatory structure can be verified experimentally.

Two examples from yeast Both clusters have at least 10 genes each, and there is some confidence that genes with the same upstream TFs will exhibit the same expression pattern as these clusters.

Constructing the models Using expression data from 30 microarrays, the authors identified 5547 genes with “significant” expression levels in yeast, and this data was used to construct 49 models of expression patterns.

These 49 models were applied to five test sets of expression data, using only the upstream 800 bp region as input. They found that the expression pattern was correctly predicted for 1898 genes out of the test set(s) of 2587 genes. This amounts to 73% accuracy (random would be 1/49, or 2%). Predictive accuracy

Application to C. elegans Given the larger amount of regulatory sequences in higher order organisms, and the potential for more complex regulation, the authors had low expectations for applying this model to C. elegans. Using 2000 bp of upstream sequence, and microarray expression data including Hill (2000), the authors were surprised to learn that they could predict expression patterns for roughly half of the genes in the C. elegans dataset.

An example from C. elegans

Is it really so simple? Gene regulation involves a complex combinatorial dance of numerous factors aside from the presence or absence of TF binding sites. The authors have deliberately limited their scope to cis-acting upstream factors-- ignoring regulatory elements in introns or downstream regions, as well as the effects of operons, alternative splicing, histone modifications, methylation, et cetera

Model constraints Several bits of information were found to be significant factors in improving the predictive accuracy of the models: A.Motif orientiation ( ) B.Distance from the start codon C.The particular order of various TFs D.The presence of multiple copies of the same TF All of those factors were included in the model as priors.

Why is distance from the start codon significant? From Harbison et al (2004), Nature 431,

The number of copies of a TF binding site is relevant.. From Molecular Biology of the Cell, 4th edition

Motif combinatorics and predictive accuracy The order of various TFs is significant Combinatoric models are more accurate than single-TF models (unless a gene is under the control of only one TF).

Future directions.. Because of the sensitivity of the model(s), even a very small amount of ambiguity can yield junk results. For this reason, SAGE data is not particularly suitable, as only unique SAGE tags can be said to be unambiguous; this in turn excludes all sorts of potentially useful data. However, we could use the microarray-based predictions to pick gene regulatory structures to investigate..