Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide Regulatory Complexity in Yeast Promoters Zhu YANG 15 th Mar, 2006.
Identification of Transcriptional Regulatory Elements in Chemosensory Receptor Genes by Probabilistic Segmentation Steven A. McCarroll, Hao Li Cornelia.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
A Data Mining Method to Predict Transcriptional Regulatory Sites Based On Differentially Expressed Genes in Human Genome HSIEN-DA HUANG, HUEI-LINA and.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Regulatory element detection using correlation with expression (REDUCE) Literature search WANG Chao Sept 14, 2004.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
Sequencing a genome and Basic Sequence Alignment
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
Sequencing a genome and Basic Sequence Alignment
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^,
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Identification of Compositionally Similar Cis-element Clusters in Coordinately Regulated Genes Anil G Jegga, Ashima Gupta, Andrew T Pinski, James W Carman,
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
EB3233 Bioinformatics Introduction to Bioinformatics.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Cis-regulatory Modules and Module Discovery
Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI-Jena, Germany Introduction: During the last 10 years, a large number of complete.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Projects
bacteria and eukaryotes
Detection of genome regulation sequences
A Very Basic Gibbs Sampler for Motif Detection
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Bioinformatics tools to identify structured motifs in the upstream regions of stress-response-involved genes in Tetrahymena thermophila Antonietta La Terza*,
Recitation 7 2/4/09 PSSMs+Gene finding
Introduction to Bioinformatics II
Homework #2 is due 10/17 Bonus #1 is due 10/24 FrakenFlowers.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Finding regulatory modules
Working in the Post-Genomic C. elegans World
Nora Pierstorff Dept. of Genetics University of Cologne
Presentation transcript:

Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe Institute for Genetics, University of Cologne, Germany, Institute for Developmental Biology, University of Cologne ABSTRACT: Several regulatory region prediction methods using computation have been developed in the last few years. Most of the available methods require transcription factor binding site matrices to achieve reasonable results. In order to avoid the need of biological information, we developed a program named SHUREG to predict regulatory regions without any extrinsic information but the sequence itself. Calculating shustrings (shortest unique substrings) we find statistically overrepresented motifs which are assumed to be indicators of regulatory elements. [3] SHUREG - ALGORITHM: 1.Calculation of shustrings (shortest unique substrings) at every position relative to a surrounding window on forward- and backwardstrand. 2.Counting of neighbours (exact repeats in the surrounding) 3.Calculation of P-values for each shustring 4.Smoothing of P-values WHY SHORTEST UNIQUE SUBSTRINGS? Analyzing the human (mouse-) genome we found 255 (293) global shustrings of length 11bp. [4] 29 (22) of the shustrings are positioned in 1000bp-upstream- regions. The probability of this distribution is 3.3 x (5.0 x ) We applied our program to different well explored regions of the Drosophila melanogaster genome. Our dataset includes segmentation and dorsal- ventral genes. We compare our predictions to the results of AHAB[1], a program that uses PWM‘s Figure 1 shows two predictions for the giant region. 1a is computed using Shureg. 1b is the result of the Ahab- program applied to the same sequence. Figure 2a shows the Shureg prediction for the regulatory regions of the hairy gene. 2b shows the corresponding Ahab-prediction. Figure 3 is partitioned into 3 predictions. Figure 3a is the Shureg prediction for the dorsal regulated enhancer of the sog gene. Figure 3b shows the Ahab prediction using only the PWM of the Dorsal binding site. Figure 3c shows the Ahab-prediction using all known PWM‘s in an hypothetical case that we do not know the actual factors responsable for this gene regulation. INTRODUCTION: In order to localize regulatory regions three basic computational approaches have been followed. 1.Search for bindingsites of known transcription factors using Position Weight Matrices. [1] 2.Search for conserved motifs in upstream-regions of homologous or coregulated genes. [2] 3.Search for statistically overrepresented motifs [3] Our program SHUREG follows the third approach which is supported by two hypotheses: 1.Degenerate binding site lead the transcription factor to the bindingsite 2.New bindingsites can be created easily from degenerate bindingsites through few mutations to adapt the organism to environmental changes. DISCUSSION: To localize regulatory regions without any extrinsic information is a hard topic. To use the amount of overrepresented patterns in a region as indicator of regulatory regions is a reasonable measure and can lead to reasonable results. But it also leads to a lot false positive predictions, because we find additional overrepresented patterns which cannot be set into correlation to binding sites. To improve the predictions of our method we need to find more features to distinguish between true positive and false positive predictions, we are currently investigating the conservation of overrepresented motifs between species. Figure 1a: SHUREG prediction in the giant region References: [1] N. Rajewsky, M. Vergassola, U. Gaul, and E. D. Siggia (2002): Computational detection of genomic cis-regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3:30 [2] H. Bussemaker, H. Li, E Siggia (200): Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. PNAS, Aug 2000; 97 [3] Nazina A., Papatsenko D. (2003). Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics 4: /4/65 [4] Haubold, B., Pierstorff, N., Moeller, F., Wiehe, T. (2005). Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics, 6:123. Figure 1b: AHAB prediction in the giant region Figure 2a: SHUREG prediction in the hairy regionFigure 2b: Ahab prediction in the hairy region Figure 3a: SHUREG prediction in the sog region Figure 3c: AHAB prediction in the sog region using all known PWM‘s RESULTS: