A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright  1996, All rights reserved.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Intro to Comp Genomics Lecture 9: Motif finding. Sequence specific transcription factors Sequence specific transcription factors (TFs) are a critical.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Chromatin Immuno-precipitation (CHIP)-chip Analysis
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Identification of Polycomb Response Elements in Mammalian Embryonic Stem Cells and Cancer Cells Kit J. Menlove Mentored by Jianpeng Ma, Timothy Palzkill,
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Computing the exact p-value for structured motif Zhang Jing (Tsinghua University and university of waterloo) Co-authors: Xi Chen, Ming Li.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers Pengyu Hong 10/06/2005.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
A System Approach to Measuring the Binding Energy Landscapes of Transcription Factors Authors: Sebastian J. et. al Presenter: Hongliang Fei.
Modeling Regulatory Motifs 3/26/2013. Transcriptional Regulation  Transcription is controlled by the interaction of tran-acting elements called transcription.
Counting position weight matrices in a sequence & an application to discriminative motif finding Saurabh Sinha Computer Science University of Illinois,
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Collaborative Filtering Matrix Factorization Approach
Algorithms in Bioinformatics Morten Nielsen Department of Systems Biology, DTU.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
es/by-sa/2.0/. Large Scale Approaches to the Study of Gene Expression Prof:Rui Alves Dept.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Expectation Maximization and Gibbs Sampling – Algorithms for Computational Biology Lecture 1- Introduction Lecture 2- Hashing and BLAST Lecture 3-
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
es/by-sa/2.0/. Large Scale Approaches to the Study of Gene Expression Prof:Rui Alves Dept.
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Analysis of the yeast transcriptional regulatory network.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Algorithms in Bioinformatics: A Practical Introduction
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions Jiajian Liu and Gary D. Stormo Presented by Aliya.
3.6 Solving Systems Using Matrices You can use a matrix to represent and solve a system of equations without writing the variables. A matrix is a rectangular.
Local Multiple Sequence Alignment Sequence Motifs
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Fall 1999 Copyright © R. H. Taylor Given a linear systemAx -b = e, Linear Least Squares (sometimes written Ax  b) We want to minimize the sum.
2.5 Determinants and Multiplicative Inverses of Matrices. Objectives: 1.Evaluate determinants. 2.Find the inverses of matrices. 3.Solve systems of equations.
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
Section 1.7 Linear Independence and Nonsingular Matrices
Indiana University Bloomington, IN Junguk Hur School of Informatics & Center for Genomics and Bioinformatics Characterization of transcriptional responses.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
2.5 – Determinants and Multiplicative Inverses of Matrices.
Transcription factor binding motifs (part II) 10/22/07.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
CS273B: Deep learning for Genomics and Biomedicine
Detection of genome regulation sequences
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Collaborative Filtering Matrix Factorization Approach
Presented by, Jeremy Logue.
Presented by, Jeremy Logue.
Presentation transcript:

A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529 – Term Project

BACKGROUND Motif Finding : Important challenge in computation biology. Motif Finding : Important challenge in computation biology. Current Algorithms : Current Algorithms : Many stochastic or combinatorial algorithms to find motifs for a given set of sequences; MEME, Gibbs, CONSENSUS, and etc Many stochastic or combinatorial algorithms to find motifs for a given set of sequences; MEME, Gibbs, CONSENSUS, and etc No quantitative data No quantitative data High-throughput genome-wide quantitative data are available High-throughput genome-wide quantitative data are available ChIP-on-Chip: Chromatin ImmunoPrecipitation on Microarray (In vivo) ChIP-on-Chip: Chromatin ImmunoPrecipitation on Microarray (In vivo) PBM: Protein-Binding Microarray (In vitro) PBM: Protein-Binding Microarray (In vitro) EMBF (Energy Based Motif Finding) Algorithm EMBF (Energy Based Motif Finding) Algorithm Ratio  Binding Affinity  Energy Ratio  Binding Affinity  Energy

ChIP-on-Chip ( Ren et al. ) Array of intergenic sequences from the whole genome

Energy-Based Motif Finding (EBMF) Chin et al Let e i be the average binding energy between TF and sequence s i, then e i = -ln(K e ) Let e i be the average binding energy between TF and sequence s i, then e i = -ln(K e ) Ke = [TFs i ] / [TF][s i ] Color intensity ratio represents the value of Ke Ke = [TFs i ] / [TF][s i ] Color intensity ratio represents the value of Ke Problem Definition Problem Definition Solve A*X = B ( A: Matrix to be decomposed, B: Total Energy, X=New Energy at each Position,To be calculated) Solve A*X = B ( A: Matrix to be decomposed, B: Total Energy, X=New Energy at each Position,To be calculated) Minimize the prediction error Minimize the prediction error Iteratively improve candidate matrix M Iteratively improve candidate matrix M 4 x l energy matrix M to represent the motif (l=motif length) 4 x l energy matrix M to represent the motif (l=motif length)

Goals and Methods Ultimately to build better model representing the local and non-local correlation between nucleotides Ultimately to build better model representing the local and non-local correlation between nucleotides Based on the EBMF algorithm Based on the EBMF algorithm Utilizing quantitative measure for DNA-protein interaction Utilizing quantitative measure for DNA-protein interaction Potentially more accurate than the Positional Weight Matrices (PWMs) Potentially more accurate than the Positional Weight Matrices (PWMs) Implementation of EBMF first Implementation of EBMF first Solving linear equations Solving linear equations Matrix Solution : QR-decomposition / LR-decomposition Matrix Solution : QR-decomposition / LR-decomposition Least square method : Downhill Simplex Method Least square method : Downhill Simplex Method Programming Language : Perl Programming Language : Perl Data Set : Yeast ChIP-on-Chip data (GAL4, GCN4, RAP1) Data Set : Yeast ChIP-on-Chip data (GAL4, GCN4, RAP1)

Results Implemented EBMF failed to find the motif for each TFs even though initial matrix starting from the TRANSFAC PSSM. Implemented EBMF failed to find the motif for each TFs even though initial matrix starting from the TRANSFAC PSSM. QR/LR-decomposition: Resulted in Infinity QR/LR-decomposition: Resulted in Infinity  Due to singular-like matrix (up to the precision of the machine)  Due to singular-like matrix (up to the precision of the machine) Downhill Simplex Method: Too slow and still deviated from the TRANSFAC result Downhill Simplex Method: Too slow and still deviated from the TRANSFAC result MATLAB : Same as QR MATLAB : Same as QR Tried to modify the matrix Tried to modify the matrix Add small non-zero number to zero element Add small non-zero number to zero element Limit to only one TFBS per promoter Limit to only one TFBS per promoter Worked for short length of random sets but still did not work for the yeast TFs. Worked for short length of random sets but still did not work for the yeast TFs.

Discussion Data are singular? Any other tricky way? Data are singular? Any other tricky way? Try other data set. Try other data set. Other direction to use quantitative protein- DNA binding data  Possible correlation among TFs Other direction to use quantitative protein- DNA binding data  Possible correlation among TFsAcknowledgement I deeply thank Dr. Haixu Tang I deeply thank Dr. Haixu Tang