Nora Pierstorff Dept. of Genetics University of Cologne

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Transcription factor binding motifs (part I) 10/17/07.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
A Data Mining Method to Predict Transcriptional Regulatory Sites Based On Differentially Expressed Genes in Human Genome HSIEN-DA HUANG, HUEI-LINA and.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Regulatory element detection using correlation with expression (REDUCE) Literature search WANG Chao Sept 14, 2004.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Finding Regulatory Motifs in DNA Sequences
Lecture 12 Splicing and gene prediction in eukaryotes
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Statistical Analysis for Word counting in Drosophila Core Promoters Yogita Mantri April Bioinformatics Capstone presentation.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Construction of Substitution matrices
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Motif Search and RNA Structure Prediction Lesson 9.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Finding genes in the genome
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Projects
bacteria and eukaryotes
Detection of genome regulation sequences
Recitation 7 2/4/09 PSSMs+Gene finding
Introduction to Bioinformatics II
Finding regulatory modules
Mapping Global Histone Acetylation Patterns to Gene Expression
Presented by, Jeremy Logue.
Presented by, Jeremy Logue.
The Bov-A2 element is conserved in the NOS2 gene of bovid species.
Gene regulatory regions of the insect/crustacean egr-B homologs.
Presentation transcript:

Nora Pierstorff Dept. of Genetics University of Cologne 30.8.2005 Combined ab initio and comparative analysis of putative regulatory regions Nora Pierstorff Dept. of Genetics University of Cologne 30.8.2005

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Eukaryotic regulation model

3 Approaches Search for binding sites of known transcription factors using Position Weight Matrices. Search for conserved motifs in upstream-regions of homolog or coregulated genes. Search statistical overrepresented motifs

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Ab Initio Approach (overrepresented patterns) overrepresented patterns are frequent in the DNA => many false positive predictions amount of available data is not large enough to find additional reliable universally valid rules

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Dataset (collected by Nazina et al. 2003) target-species: Drosophila melanogaster reference species: D. yakuba D. ananassae D. pseudoobscura D. virilis # sequences: 39 # bp: 1080200 # regulatory regions: 87 # bp in enh: 158317 enhancer/sequence: 2.462 amount of bp in enhancers: 0.14656 Dorsal motif     dorsal matches

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Are enhancers alignable? Emberly et al. (2003) the overlap of binding sites and conserved sequence blocks is not much greater than by chance, but still statistically significant compared organisms: D. melanogaster and D. pseudoobscura alignment methods: LAGAN, SMASH (construct chains of local alignments)

Assumptions about enhancer conservation binding sites contain core sequences essential to bind transcription factor core sequences are conserved between binding sites of one species and between species binding sites are indicated by short, exactly conserved, overrepresented patterns

Alignment of short exact matches input: chain of high scoring fragments from blastn alignment of each sequence pair output: regions containing a high amount of short conserved stretches

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Result using only comparative approach with 5 species m8 region score = number of short conserved stretches in a 200bp window

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

searching overrepresented motifs in conserved region input: all short conserved words 1. step: counting the occurrence of all 5bp-substrings of the word in the 1000 surrounding basepairs 2. calculating one observed/expected ratio for every species output: conserved stretches containing at least one 5mer which is overrepresented in each species

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Improvement by combination m8 region score = number of short conserved stretches in a 200bp window m8 region score = number of short conservedoverrepresented stretches in a 200bp window

improvement by combination

Outline Introduction Ab Initio Approach Datasets Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

Discussion use of a combination of methods improves predictions in nearest future regulatory regions can be found without knowing the binding transcription factors, if enough related species are known. more features to differ between conserved regulatory regions and other functional conserved regions need to be found

References E. Emberly, N. Rajewsky, E. Siggia (2003) Conservation of regulatory elements between two species of Drosophila BMC Bioinformatics 2003, 4:57 A. Nazina, D. Papatsenko (2003) Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics. 2003 Dec 22;4:65.