* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.
Promoter and Module Analysis Statistics for Systems Biology.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Gene regulatory network
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
[Bejerano Fall09/10] 1 Thank you for the midterm feedback!
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
Finding Regulatory Motifs in DNA Sequences
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Modeling Regulatory Motifs 3/26/2013. Transcriptional Regulation  Transcription is controlled by the interaction of tran-acting elements called transcription.
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Cis-regulatory element study in transcriptome Jin Chen CSE Fall
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Location analysis of transcription factor binding sites Guy Naamati Andrei Grodzovky.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Current Topics in Genomics and Epigenomics – Lecture 2.
Genome-wide computational prediction of transcriptional regulatory modules reveal new insights into human gene expression Mathieu Blanchette et al. Presented.
Finish up array applications Move on to proteomics Protein microarrays.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Microarrays and Their Uses Brad Windle, Ph.D
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Conservation and Evolution of Cis-Regulatory Systems Tal El-Hay Computational Biology Seminar חנוכה תשס"ו December 2005.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions Jiajian Liu and Gary D. Stormo Presented by Aliya.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Local Multiple Sequence Alignment Sequence Motifs
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Motif Search and RNA Structure Prediction Lesson 9.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Transcription factor binding motifs (part II) 10/22/07.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
1 How do regulatory networks evolve? Module = group of genes co-regulated by the same regulatory system * Evolution of individual gene targets Gain or.
Last time … * Constraint on transcription factor binding sites Sites with the most ‘information content’ generally evolve slowest * Stabilizing selection.
Change in Pufs and their RNA InteractionsAnalogous change in transcription factors and their gene regulation Puf binding specificity tends to be conserved.
Regulation of Gene Expression
Mapping Global Histone Acetylation Patterns to Gene Expression
Presented by, Jeremy Logue.
Nora Pierstorff Dept. of Genetics University of Cologne
Presented by, Jeremy Logue.
BIOBASE Training TRANSFAC® ExPlain™
Presentation transcript:

* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs

2 Evolution and constraint on cis-regulatory motifs (focusing on TF binding sites) Many DNA binding proteins recognize specific (often short) DNA sequences. Often bind ‘degenerate’ sequences, since some bases more important for contact. Many work cooperatively with other factors to bind.

3 A G A T G G A T G G T G A T T G A T G T T G A T G G A T G G A G A T T G A T C G T G A T G G A T T G A G A T G G A T T G W G A T G G A T N G Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 IPUAC consensus: Representing the set of TF binding sites within a genome ORFsupstream

4 A G A T G G A T G G T G A T T G A T G T T G A T G G A T G G A G A T T G A T C G T G A T G G A T T G A G A T G G A T T G G A T C Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 PWM represents frequencies of each base at each position in the motif Position-weight matrices are a better representation

5 Web-logo: A graphical representation of PWMs Position bits Information Profile: Information content represents the frequency of each base at each position across ALL binding sites in an individual

6 To study the evolution of cis regulatory elements, we first need to identify them in genomes Identification of cis-regulatory elements 1. Scan genome for matches to known matrix/consensus problem is that there are many nonfunctional in the genome - poor predictor of function 2. Phylogenetic footprinting: overly-conserved sequences in multiple alignments Variation within element is typically lower than surrounding ‘nonfunctional’ DNA Computational predictions:

7 Simplest case: stretches of very highly conserved sequence Kellis et al “ Sequencing and comparison of yeast species to identify genes and regulatory elements ” Sequenced 4 closely related Saccharomyces genomes & identified conserved sequences in multiple alignments of orthologous sequences from the four species. Need species close enough to get reliable DNA alignment Position of elements has to be conserved for detection (keep this in mind when we get to stabilizing selection at the end …)

8 Identification of cis-regulatory elements 1. Scan genome for matches to known matrix/consensus problem is that there are many nonfunctional in the genome - poor predictor of function 2. Phylogenetic footprinting: overly-conserved sequences in multiple alignments Variation within element is typically lower than surrounding ‘nonfunctional’ DNA Computational predictions: 3. Network/module approach: Focus on groups of co-regulated genes to increase statistical power Look for statistically significant enrichment of sequences in the group of upstream regions from a group of co-regulated genes

9 Gasch et al PLoS Biol “Conservation and evolution of cis-regulatory systems in ascomycete fungi” * Many conserved elements are connected to similar gene groups over 100’s of millions of years. Results: * Some gene groups show show evidence of conserved co-regulation but evolved elements * One example of co-evolved TF binding specificity and upstream sequence elements

10 Identification of cis-regulatory elements 1. Scan genome for matches to known matrix/consensus problem is that there are many nonfunctional in the genome - poor predictor of function 2. Phylogenetic footprinting: overly-conserved sequences in multiple alignments Variation within element is typically lower than surrounding ‘nonfunctional’ DNA Computational predictions: 3. Network/module approach: Focus on groups of co-regulated genes to increase statistical power Look for statistically significant enrichment of sequences in the group of upstream regions from a group of co-regulated genes Experimental: 4. Chromatin immunoprecipitation (ChIP-chip or ChIP-seq) to identify binding loci genomewide can do ChIP analysis across species or in one species then compare computationally

Chromatin-immunoprecipitation coupled to deep sequencing: ChIP-Seq: 1.Add crosslinker to cells 2.Lyse & shear DNA 3.IP protein of interest with antibody 4.Process recovered DNA & sequence Lessons from ChIP Best/most DNA recovery usually means highest TF-DNA affinity Often TFs bind DNA despite no recognizable ‘binding site’ in the region (note ChIP identifies a region bound, not a site) Many “low-occupancy” (e.g. weakly recovered) sites may be real binding that is non-functional

12 What kinds of constraints act on TF binding sites? 1. Productive contact between protein-DNA (constraint on sequence of binding site)

13 Sites of contact evolve slower (under more constraint)

14 Variation within a site across species parallels variation across sites within a genome Open symbols: Information contentClosed symbols: Substitutions per site

15 1.Productive contact between protein-DNA (constraint on sequence of binding site) 2.Distance from transcription start site (constraint on position of the binding site) also may be restricted by placement of nucleosome-depleted regions What kinds of constraints act on TF binding sites?

16 1.Productive contact between protein-DNA (constraint on sequence of binding site) 2.Distance from transcription start site (constraint on position of the binding site) also may be restricted by placement of nucleosome-depleted regions 3. Spacing between elements if cooperative TF interactions (constraint of position) What kinds of constraints act on TF binding sites?

17 1.Conserved regulation but evolution of regulatory regions (stabilizing selection) Binding-site turnover: non-conserved sites but conserved regulation Seems to be very prevalent across many organisms What kinds of constraints act on TF binding sites? How do Regulatory Regions evolve? 1.Productive contact between protein-DNA (constraint on sequence of binding site) 2.Distance from transcription start site (constraint on position of the binding site) also may be restricted by placement of nucleosome-depleted regions 3. Spacing between elements if cooperative TF interactions (constraint of position)

18 Ludwig et al. Nature Eve stripe 2 expression highly conserved across species. Four TFs act combinatorially To determine Eve2 patterns None of 16 binding sites in stripe 2 enhancers is perfectly conserved across 13 species

19 Ludwig et al. Nature Evidence for stabilizing selection in a eukaryotic enhancer element. Native D. pseudoobscura enhancer works well in D. melanogaster lacZ gene

20 But hybrid enhancers (mel-pseudo or pseudo-mel from 5’ to 3’) are defective They argue for stabilizing selection and binding-site turnover across the enhancer lacZ gene Ludwig et al. Nature Evidence for stabilizing selection in a eukaryotic enhancer element. lacZ gene

21 Co-evolution of Rpn4 sites upstream proteosome genes & Rpn4 binding specificity

22 1.Conserved regulation but evolution of regulatory regions (stabilizing selection) Binding-site turnover: non-conserved sites but conserved regulation Seems to be very prevalent across many organisms Co-evolution between binding site and TF specificity What kinds of constraints act on TF binding sites? How do Regulatory Regions evolve? 1.Productive contact between protein-DNA (constraint on sequence of binding site) 2.Distance from transcription start site (constraint on position of the binding site) also may be restricted by placement of nucleosome-depleted regions 3. Spacing between elements if cooperative TF interactions (constraint of position)