Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
A Novel Knowledge Based Method to Predicting Transcription Factor Targets
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Chapter 13: RNA and Protein Synthesis
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
Promoter Panel Review. Background related Promoter In genetics, a promoter is a DNA sequence that enables a gene to be transcribed. It may be very long.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Fuzzy K means.
Regulatory element detection using correlation with expression (REDUCE) Literature search WANG Chao Sept 14, 2004.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Finding Regulatory Motifs in DNA Sequences
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
2.7 DNA Replication, transcription and translation
Gene expression.
Molecular genetics of gene expression Mat Halter and Neal Stewart 2014.
Sequencing a genome and Basic Sequence Alignment
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Cis-regulatory element study in transcriptome Jin Chen CSE Fall
Introduction to gene expression Seema Zargar. Lecture outline Introduction to all terms used in Gene expression.
Making of Proteins: Transcription and Translation
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Gary Stormo by Andrew Bardee. History Born 1950 in South Dakota Undergraduate in Biology from Caltech PhD in Molecular Biology from University of Colorado.
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Sequencing a genome and Basic Sequence Alignment
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
RNA: Structure & Function. What is RNA? RNA is a nucleic acid called Ribonucleic Acid Functions of RNA:  RNA transfers genetic information from the nucleus.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Introduction to Gene Expression
Algorithms in Bioinformatics: A Practical Introduction
Motif Detection in Yeast Vishakh Joe Bertolami Nick Urrea Jeff Weiss.
Nucleic Acids Comparing DNA and RNA. Both are made of nucleotides that contain  5-carbon sugar,  a phosphate group,  nitrogenous base.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
How can we find genes? Search for them Look them up.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Sequence Alignment.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics II
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
PROTEIN SYNTHESIS.
A Very Basic Gibbs Sampler for Motif Detection
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Protein Synthesis Genetics.
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
Notes – Protein Synthesis: Transcription
13.1 RNA.
SEG5010 Presentation Zhou Lanjun.
Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.
Nora Pierstorff Dept. of Genetics University of Cologne
Protein Synthesis.
Presentation transcript:

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey

Outline  Transcription Factor  Enhances  Problem Statement  Predicting Transcription Factor Binding Sites  Predicting Enhancers in a Gene Sequence  Making use of Co-Expressed Genes  Results  Limitations and Future Work  References  Suggestions and Questions

Transcription Factor  Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product.  Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme, RNA polymerase.  Transcription factors are proteins that bind to specific DNA sequences, thereby controlling the flow (or transcription) of genetic information from DNA to messenger RNA.

Enhancers  Enhancers are DNA sequence that control efficiency and rate of transcription of a gene  They tell when and where a gene promoter can be used and how much of the gene product to make.

Problem Statement  To find Transcription factor binding sites in DNA sequences  To predict putative enhancers in a co- expressed gene by clustering the TFBS

The ‘Workflow’  Step1: Prediction of TFBS  P-Match Algorithm  Step 2: Clustering of TFBS  Sliding Window Approach  Trie-Based Approach  Step 3: Filtering Process  Filter Enhancers by making use of Co- Expressed Genes

P-Match Algorithm  Combines both pattern matching and weight-matrix approach  How does it work ?  PWM are computed initially from nucleotide frequency matrix by using: where,

P-Match Algorithm(cont.)  D-Score is computed which estimates similarity between a TF and a gene sub- sequence  We have used two value of d-Scores:  d_matrix – 0.9  d_core – 0.9

Clustering of TFBS  Sliding Window Approach  A window of fixed size is used to scan dense regions of enhancers.  Regions are labeled as ‘dense’ if they exceed the MINTFBS number (in our implementation we have MINTFBS as 15)  Limitation: Dense clusters of TFBSs but with the size very small with respect to window size may miss out.

Clustering of TFBS(contd)  Trie-Based Approach  All clusters with number of TFBSs greater than a threshold(MINTFBS) is generated such that the distance between far-end TFBSs are not greater than maximum length allowed.  It starts with pairing up each of the TFBSs to both its neighbours. In the next iteration, the pairs are paired to both their neighbouring pairs and so on. The process builds up a trie in a bottom-up way.  During the process the clustered TFBSs that satisfy clustering criteria are added to the potential enhancers list and the clusters in which distance between farthest TFBSs cross the MAXLENGTH are discarded from clustering process.

Filtering Step  Studies have found that co-expressed genes are more likely to contain shared enhancers.  So, if an enhancer is present in most of the co-expressed genes, it has high probability of actually being an enhancer.  As the previous steps are error-prone, we don’t do an exact match, but based on the following score:

Results  Enhancers validation require a ChIP-seq based experiments.  RedFly is a database of cis-regulatory modules of Drosophila Melanogaster’s genes reported in biological research papers.  Validation on above step won’t capture correctness of filtering step.

Results (contd…) GeneNo. of Enhacers predicted No. of Enhancers matched with RedFly No. of Enhancers not matched with RedFly No. of True Negatives DPP9642 HOE-I12761 tadr wash15541 Rab Deaf Bace14752

Limitations  Does not work well if the set of Co- Expressed Genes is not correct.  May give potentially larger number of enhancers if a single gene is provided as input.  Quality of results depend on user interference.

References  Chekmenev, D. S., Haid, C., & Kel, A. E. (2005). P- Match: transcription factor binding site search by combining patterns and weight matrices. Nucleic acids research, 33(suppl 2), W432-W437.  Gallo, S. M., Li, L., Hu, Z., & Halfon, M. S. (2006). REDfly: a regulatory element database for Drosophila. Bioinformatics, 22(3),  Choi, D., Fang, Y., & Mathers, W. D. (2006). Condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome.Genomics, 87(4),

Thank You. Suggestions? Questions?