Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Thermodynamic Models of Gene Regulation Xin He CS598SS 04/30/2009.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Gene regulatory network
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.
Identification of regulatory elements. Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy.
Transcription factor binding motifs (part I) 10/17/07.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers Pengyu Hong 10/06/2005.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Marcin Pacholczyk, Silesian University of Technology.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Reconstruction of Transcriptional Regulatory Networks
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Learning the cis regulatory code by predictive modeling of gene regulation (MEDUSA) Christina Leslie Center for Computational Learning Systems Columbia.
Analysis of the yeast transcriptional regulatory network.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Motif discovery and Protein Databases Tutorial 5.
From Genomes to Genes Rui Alves.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Conference Report: Recomb Satellite NYC, Nov 2010 DREAM, Systems Biology and Regulatory Genomics.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Cluster validation Integration ICES Bioinformatics.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Definitions Transcriptional Modules (TM) are groups of co-regulated genes and transcription factors regulating their expression –Basic building blocks.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Transcription factor binding motifs (part II) 10/22/07.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Regulation of Gene Expression
Transcription factor binding sites and gene regulatory network
Learning Sequence Motif Models Using Expectation Maximization (EM)
Presented by, Jeremy Logue.
Nora Pierstorff Dept. of Genetics University of Cologne
Presented by, Jeremy Logue.
Presentation transcript:

Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University

Transcription in higher eukaryotes Gene Expression 1.Chromatin structure 2.Initiation of transcription 3.Processing of the transcript 4.Transport to the cytoplasm 5.mRNA translation 6.mRNA stability 7.Protein activity stability

Transcriptional Regulation Nuclear membrane

Transcriptional Regulation Nuclear membrane Binding site/motif CCG__CCG Genome-wide mRNA transcript data (e.g. microarrays)

Transcriptional Regulation Nuclear membrane Binding site/motif CCG__CCG Understand which regulators control which target genes Discover motifs representing regulatory elements Learning problems:

Some common approaches Cluster-first motif discovery –Cluster genes by expression profile, annotation, … to find potentially coregulated genes –Find overrepresented motifs in promoter sequences of similar genes (algorithms: MEME, Consensus, Gibbs sampler, AlignACE, …) (Spellman et al. 1998)

Training data – Features label promoter sequence regulator expression feature vector

What is PWM?  Transcription factor binding sites (TFBSs) are usually slightly variable in their sequences.  A positional weight matrix (PWM) specifies the probability that you will see a given base at each index position of the motif.

PWM for ERE 1.acggcagggTGACCc 2.aGGGCAtcgTGACCc 3.cGGTCGccaGGACCt 4.tGGTCAggcTGGTCt 5.aGGTGGcccTGACCc 6.cTGTCCctcTGACCc 7.aGGCTAcgaTGACGt. 41.cagggagtgTGACCc 42.gagcatgggTGACCa 43.aGGTCAtaacgattt 44.gGAACAgttTGACCc 45.cGGTGAcctTGACCc 46.gGGGCAaagTGACTg 1.acggcagggTGACCc 2.aGGGCAtcgTGACCc 3.cGGTCGccaGGACCt 4.tGGTCAggcTGGTCt 5.aGGTGGcccTGACCc 6.cTGTCCctcTGACCc 7.aGGCTAcgaTGACGt. 41.cagggagtgTGACCc 42.gagcatgggTGACCa 43.aGGTCAtaacgattt 44.gGAACAgttTGACCc 45.cGGTGAcctTGACCc 46.gGGGCAaagTGACTg Given N sequence fragments of fixed length, one can assemble a position frequency matrix (number of times a particular nucleotide appears at a given position). A normalized PFM, in which each column adds up to a total of one, is a matrix of probabilities for observing each nucleotide at each position. Position frequency matrix (PFM) (also known as raw count matrix) PFM should be converted to log-scale for efficient computational analysis. To eliminate null values before log-conversion, and to correct for small samples of binding sites, a sampling correction, known as pseudocounts, is added to each cell of the PFM. Position weight matrix (PWM) (also known as position-specific scoring matrix)

Position Weight Matrix for ERE Converting a PFM into a PWM – raw count (PFM matrix element) of nucleotide b in column i N – number of sequences used to create PFM (= column sum) - pseudocounts (correction for small sample size) p(b) - background frequency of nucleotide b For each matrix element do: A C G T

G G G T C A G C A T G G C C A Absolute score of the site=11.57 Scoring putative EREs by scanning the promoter with PWM A C G T

Yeast ESR: Biological Validation STRE element Universal stress repressor motif

Previous work: “Structure learning” Graphical models (and other methods) –Learn structure of “regulatory network”, “regulatory modules”, etc. –Fit interpretable model to training data –Model small number of genes or clusters of genes –Many computational and statistical challenges; often used for qualitative hypotheses rather than prediction (Segal et al, 2003, 2004) (Pe’er et al. 2001)

Signaling networks in a cell

Regulator-motif associations in nodes can have different meanings: Need other data to confirm binding relationship between regulator and target (e.g. ChIP-chip) Still, can determine statistically significant regulator- target relationships from regulation program TF M TF P P MpMp P MMpMp Direct binding Indirect effect Co-occurrence Network inference

Example: oxygen sensing and regulatory network

ChIP-chip: genome-wide protein- DNA binding data, i.e. what promoters are bound by TF? Investigate regulatory network model: use ChIP-chip data in place of motifs (no motif discovery) –Features: (regulator, TF- occupancy) pairs TFP2P2 P1P1 Binding data for regulatory networks

Inferring regulatory networks from the combination of expression data and binding data

CCNL1 BRF1 ER FOS MYC CEBP XBP1 RXRA HSF2 PNN NRIP1 TXNDC IVNS1ABP BATF HES1 CHAF1B CSDE1 CUTL1 PURB ADAR C140RF43 SP3 DDX20 ELF3 TXNIP PAWR BRIP1 FOXP4 ZNF394 BAZ1B STRAP ASCC3 MKL2 GTF2I RUVBL1 RFC1 ZNF50 0 TTF2 RAB18 ZKSCAN1 MSX2 LASS2 HDAC1 ZBTB41 TBX2 THRAP1 VPS72 TLE3 BHLHB2 ZNF38 ZNF23 9 DNMT1 HIF1A HEY2 An extended ER regulatory network in MCF7 cells

Signaling molecules -- Networks Find all SMs that associate as regulators with a particular TF’s ChIP occupancy in ADT features e.g. Hypothesis: Glc7 phosphatase complex interacts with Hsf1 in regulation of Hsf1 targets (Interaction supported in literature) Hsf1Gac1 Gip1 Sds22 Glc7 phosphatase complex TF SM mRNA

Input Data Ab initio Motif Discovery Programs Statistical Methods STAMP Matching Results SeqLog PWM P-value Known or novel motifs Bootstrap re-sampling Fisher test Weeder MaMf MEME FASTA file Contact Info Control data (optional)

Software Demo W-ChIPMotifs HRTargetDB