Transcription factor binding sites and gene regulatory network

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
The multi-layered organization of information in living systems
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
Gene regulatory network
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Identification of regulatory elements. Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy.
DNA Regulatory Binding Motif Search Dong Xu Computer Science Department 109 Engineering Building West
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Computational Approaches for Understanding Biological Significance of Microarray Data Liangjiang (LJ) Wang KSU Bioinformatics Center, Biology.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Learning the cis regulatory code by predictive modeling of gene regulation (MEDUSA) Christina Leslie Center for Computational Learning Systems Columbia.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
From Genomes to Genes Rui Alves.
Conference Report: Recomb Satellite NYC, Nov 2010 DREAM, Systems Biology and Regulatory Genomics.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Cluster validation Integration ICES Bioinformatics.
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Finding genes in the genome
Transcription factor binding motifs (part II) 10/22/07.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Regulation of Gene Expression
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
A Very Basic Gibbs Sampler for Motif Detection
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Learning Sequence Motif Models Using Expectation Maximization (EM)
1 Department of Engineering, 2 Department of Mathematics,
Dennis Shasha, Courant Institute, New York University With
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Self-organizing map numeric vectors and sequence motifs
Presented by, Jeremy Logue.
Nora Pierstorff Dept. of Genetics University of Cologne
Presented by, Jeremy Logue.
BIOBASE Training TRANSFAC® ExPlain™
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Presentation transcript:

Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University

Transcription in higher eukaryotes Gene Expression Chromatin structure Initiation of transcription Processing of the transcript Transport to the cytoplasm mRNA translation mRNA stability Protein activity stability

Transcriptional Regulation Nuclear membrane

Transcriptional Regulation Nuclear membrane Binding site/motif CCG__CCG Genome-wide mRNA transcript data (e.g. microarrays)

Learning problems: Transcriptional Regulation Understand which regulators control which target genes Nuclear membrane Binding site/motif CCG__CCG Discover motifs representing regulatory elements

Cluster-first motif discovery Some common approaches Cluster-first motif discovery Cluster genes by expression profile, annotation, … to find potentially coregulated genes Find overrepresented motifs in promoter sequences of similar genes (algorithms: MEME, Consensus, Gibbs sampler, AlignACE, …) (Spellman et al. 1998)

Training data – Features regulator expression promoter sequence label feature vector

What is PWM? Transcription factor binding sites (TFBSs) are usually slightly variable in their sequences. A positional weight matrix (PWM) specifies the probability that you will see a given base at each index position of the motif. N C A G T Con 16 5 2 3 1 42 6 9 7 4 24 44 19 15 11 10 8 34 31 13 18 39 43 14 21 33 29 12 Pos

. PWM for ERE Position frequency matrix (PFM) (also known as raw count matrix) acggcagggTGACCc aGGGCAtcgTGACCc cGGTCGccaGGACCt tGGTCAggcTGGTCt aGGTGGcccTGACCc cTGTCCctcTGACCc aGGCTAcgaTGACGt . cagggagtgTGACCc gagcatgggTGACCa aGGTCAtaacgattt gGAACAgttTGACCc cGGTGAcctTGACCc gGGGCAaagTGACTg Given N sequence fragments of fixed length, one can assemble a position frequency matrix (number of times a particular nucleotide appears at a given position). A normalized PFM, in which each column adds up to a total of one, is a matrix of probabilities for observing each nucleotide at each position. Position weight matrix (PWM) (also known as position-specific scoring matrix) PFM should be converted to log-scale for efficient computational analysis. To eliminate null values before log-conversion, and to correct for small samples of binding sites, a sampling correction, known as pseudocounts, is added to each cell of the PFM.

Converting a PFM into a PWM Position Weight Matrix for ERE Converting a PFM into a PWM For each matrix element do: A 0.58 -0.44 -0.98 -1.21 -2.29 1.22 -0.60 -2.96 1.62 -0.72 C -1.49 -0.30 1.39 0.78 0.34 0.25 1.76 0.46 G 0.16 1.31 1.44 -0.17 -0.06 0.65 1.79 -0.64 T 0.96 -0.78 1.73 -1.84 0.23 – raw count (PFM matrix element) of nucleotide b in column i N – number of sequences used to create PFM (= column sum) - pseudocounts (correction for small sample size) p(b) - background frequency of nucleotide b

Scoring putative EREs by scanning the promoter with PWM G G G T C A G C A T G G C C A A 0.58 -0.44 -0.98 -1.21 -2.29 1.22 -0.60 -2.96 1.62 -0.72 C -1.49 -0.30 1.39 0.78 0.34 0.25 1.76 0.46 G 0.16 1.31 1.44 -0.17 -0.06 0.65 1.79 -0.64 T 0.96 -0.78 1.73 -1.84 0.23 Absolute score of the site =11.57

Yeast ESR: Biological Validation Universal stress repressor motif Xbp1 universal stress repressor, tbp1 tata box, hap1 hypoxia stress, cbf1 cell cycle regulator, gcn4 aa nitrogen stress, STRE element

Graphical models (and other methods) Previous work: “Structure learning” Graphical models (and other methods) Learn structure of “regulatory network”, “regulatory modules”, etc. Fit interpretable model to training data Model small number of genes or clusters of genes Many computational and statistical challenges; often used for qualitative hypotheses rather than prediction (Pe’er et al. 2001) (Segal et al, 2003, 2004)

Signaling networks in a cell

Network inference Regulator-motif associations in nodes can have different meanings: Need other data to confirm binding relationship between regulator and target (e.g. ChIP-chip) Still, can determine statistically significant regulator-target relationships from regulation program P Mp TF MTF P P M Mp Direct binding Indirect effect Co-occurrence

Example: oxygen sensing and regulatory network

Binding data for regulatory networks ChIP-chip: genome-wide protein-DNA binding data, i.e. what promoters are bound by TF? Investigate regulatory network model: use ChIP-chip data in place of motifs (no motif discovery) Features: (regulator, TF-occupancy) pairs P1 P2 TF

Inferring regulatory networks from the combination of expression data and binding data

An extended ER regulatory network in MCF7 cells FOS MYC CEBP XBP1 RXRA HSF2 PNN NRIP1 TXNDC IVNS1ABP BATF HES1 CHAF1B CSDE1 CUTL1 PURB ADAR C140RF43 SP3 DDX20 ELF3 TXNIP PAWR BRIP1 FOXP4 ZNF394 BAZ1B STRAP ASCC3 MKL2 GTF2I RUVBL1 RFC1 ZNF500 TTF2 RAB18 ZKSCAN1 MSX2 LASS2 HDAC1 ZBTB41 TBX2 THRAP1 VPS72 TLE3 BHLHB2 ZNF38 ZNF239 DNMT1 HIF1A HEY2 CCNL1 BRF1

Glc7 phosphatase complex Signaling molecules -- Networks Find all SMs that associate as regulators with a particular TF’s ChIP occupancy in ADT features e.g. Hypothesis: Glc7 phosphatase complex interacts with Hsf1 in regulation of Hsf1 targets (Interaction supported in literature) Hsf1 Gac1 Gip1 Sds22 Glc7 phosphatase complex TF SM mRNA

http://motif.bmi.ohio-state.edu/ChIPMotifs/ FASTA file Input Data Ab initio Motif Discovery Programs Statistical Methods STAMP Matching Results SeqLog PWM P-value Known or novel motifs Bootstrap re-sampling Fisher test Weeder MaMf MEME FASTA file Contact Info Control data (optional)

http://motif.bmi-ohio-state.edu/HRTBLDb

Software Demo W-ChIPMotifs HRTargetDB