Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Control of Expression In Bacteria –Part 1
Methods to read out regulatory functions
Annotation standards in ORegAnno (Draft) Obi Griffith The RegCreative Jamboree Nov 29, 2006 Ghent, Belgium.
A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
A Novel Knowledge Based Method to Predicting Transcription Factor Targets
Promoter and Module Analysis Statistics for Systems Biology.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali.
Gene regulatory network
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide Regulatory Complexity in Yeast Promoters Zhu YANG 15 th Mar, 2006.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.
Comparative Motif Finding
Transcription factor binding motifs (part I) 10/17/07.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Ohnologs and Regulatory Networks Robbie Sedgewick Group Meeting March 2, 2006.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Finish up array applications Move on to proteomics Protein microarrays.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Analysis of the yeast transcriptional regulatory network.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Gene Regulation, Part 1 Lecture 15 Fall Metabolic Control in Bacteria Regulate enzymes already present –Feedback Inhibition –Fast response Control.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Local Multiple Sequence Alignment Sequence Motifs
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Transcription factor binding motifs (part II) 10/22/07.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Regulation of Gene Expression
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
Presented by, Jeremy Logue.
Nora Pierstorff Dept. of Genetics University of Cologne
Presented by, Jeremy Logue.
Presentation transcript:

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha

Outline Transcription factors interpret the regulatory information encoded in DNA to induce or repress gene expression Comparative genomics has been used to find the regulatory sites in yeast genome Looking at sequence alone does not reveal if a putative site is actually functioning as a binding site ChIP-chip data (also called “location data”) provides such information Harbison et al combine these two types of data

Source: Chip-on-chip

Data Genome-wide “location analysis” using ChIP- on-chip Each experiment done with one TF 203 TFs experimented with, in “rich media conditions” 84 of these TFs also experimented with in at least one other condition Why? –Binding is not just a function of the presence of the site. It is also a function of the presence of the TF –TF may not be present in every condition

Data How were the 84 TFs (to be tested in additional conditions) chosen? If there was prior evidence that they play a role in that additional condition

ChIP-on-chip results 11,000 unique interactions between TFs and promoter regions identified A matrix of (m x n), where m is the number of TFs (203), n is the number of yeast genes (~6000) 11,000 of the entries were “1”, meaning the binding was significant –Need post-processing of binding affinities to assess if it is statistically significant

The next step: bring in the sequence Genome-wide “location data” or “binding data” combined with sequence data For each TF, collect all sequences bound by it –These are promoter length sequences, not exact binding sites Apply motif finding programs to estimate what the binding motif is (where the binding sites are)

Motif finding Only consider TFs that bound >= 10 sequences –147 such TFs Run 6 different motif-finders on the bound sequences motifs discovered ! A large number of these motifs are “variants” of the same motif, i.e., similar to each other

Motif finding Using clustering of motifs, and stringent statistical tests, identify high confidence motifs from among these motifs High confidence motifs found for 116 of the 147 TFs whose bound sequences were analyzed Now require that the motif also be conserved across other related yeast species 65 TFs with single, high-confidence, phylogenetically conserved motifs were found

Motif finding The 65 motifs were a mix of “known” and novel motifs. –That is, some of the motifs were similar to already known motifs –21 TFs’ motifs were new Took these 65 motifs, as well as other known motifs from the literature to form a compendium of 102 motifs for further analysis

Source: Harbison et al. Nature 431, (2 September 2004)

Next step We now have motifs for 102 TFs Next step is to locate binding sites of each TF in the whole genome Equivalent to finding matches to each motif in the whole genome Finding matches: –Require a high sequence similarity –Require phylogenetic conservation –Require high binding to that region by TF

Mapping sites in the genome “Map” gave 3353 sites (“interactions”) within 1296 promoters This is different from simply locating matches to motif Because TF binding information is also incorporated Under different conditions, only a subset of the binding sites in the map are actually occupied

Source: Harbison et al. Nature 431, (2 September 2004)

Does the map make sense? The map is telling us which TFs bind which actual sites in the genome, and hence which genes are being regulated In many cases, the known functions of the genes predicted to be targeted by a TF are consistent with the known function of the TF

More insights from the map Binding sites are not uniformly distributed over the promoter regions Sharply peaked distribution Very few sites in 100 bp immediately upstream of the genes Most sites (74%) are between 100 and 500 bp of gene Source: Harbison et al. Nature 431, (2 September 2004)

Arrangements of sites Specific arrangements of binding sites in a promoter Simple arrangement: one binding site for one TF Another arrangement: Repeats of a particular binding site –Allows for “graded response” –Some TFs show a significant preference for repeated sites

Source: Harbison et al. Nature 431, (2 September 2004)

Arrangements of sites Another arrangement: Binding sites for multiple TFs –“Combinatorial regulation”: In different conditions, different combinations of binding sites (and TFs) direct different gene expression –Genes whose promoters have such arrangement of sites are required for multiple pathways, and regulated in environment-specific fashion

Source: Harbison et al. Nature 431, (2 September 2004)

Arrangements of sites Another arrangement: Binding sites for specific pairs of TFs occur more frequently in same promoter than expected by chance –The two TFs perhaps interact physically in doing their job

Source: Harbison et al. Nature 431, (2 September 2004)