Download presentation
Presentation is loading. Please wait.
Published byDrusilla Hancock Modified over 9 years ago
1
Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University
2
Transcription in higher eukaryotes Gene Expression 1.Chromatin structure 2.Initiation of transcription 3.Processing of the transcript 4.Transport to the cytoplasm 5.mRNA translation 6.mRNA stability 7.Protein activity stability
3
Transcriptional Regulation Nuclear membrane
4
Transcriptional Regulation Nuclear membrane Binding site/motif CCG__CCG Genome-wide mRNA transcript data (e.g. microarrays)
5
Transcriptional Regulation Nuclear membrane Binding site/motif CCG__CCG Understand which regulators control which target genes Discover motifs representing regulatory elements Learning problems:
6
Some common approaches Cluster-first motif discovery –Cluster genes by expression profile, annotation, … to find potentially coregulated genes –Find overrepresented motifs in promoter sequences of similar genes (algorithms: MEME, Consensus, Gibbs sampler, AlignACE, …) (Spellman et al. 1998)
7
Training data – Features label promoter sequence regulator expression feature vector
8
What is PWM? Transcription factor binding sites (TFBSs) are usually slightly variable in their sequences. A positional weight matrix (PWM) specifies the probability that you will see a given base at each index position of the motif.
9
PWM for ERE 1.acggcagggTGACCc 2.aGGGCAtcgTGACCc 3.cGGTCGccaGGACCt 4.tGGTCAggcTGGTCt 5.aGGTGGcccTGACCc 6.cTGTCCctcTGACCc 7.aGGCTAcgaTGACGt. 41.cagggagtgTGACCc 42.gagcatgggTGACCa 43.aGGTCAtaacgattt 44.gGAACAgttTGACCc 45.cGGTGAcctTGACCc 46.gGGGCAaagTGACTg 1.acggcagggTGACCc 2.aGGGCAtcgTGACCc 3.cGGTCGccaGGACCt 4.tGGTCAggcTGGTCt 5.aGGTGGcccTGACCc 6.cTGTCCctcTGACCc 7.aGGCTAcgaTGACGt. 41.cagggagtgTGACCc 42.gagcatgggTGACCa 43.aGGTCAtaacgattt 44.gGAACAgttTGACCc 45.cGGTGAcctTGACCc 46.gGGGCAaagTGACTg Given N sequence fragments of fixed length, one can assemble a position frequency matrix (number of times a particular nucleotide appears at a given position). A normalized PFM, in which each column adds up to a total of one, is a matrix of probabilities for observing each nucleotide at each position. Position frequency matrix (PFM) (also known as raw count matrix) PFM should be converted to log-scale for efficient computational analysis. To eliminate null values before log-conversion, and to correct for small samples of binding sites, a sampling correction, known as pseudocounts, is added to each cell of the PFM. Position weight matrix (PWM) (also known as position-specific scoring matrix)
10
Position Weight Matrix for ERE Converting a PFM into a PWM – raw count (PFM matrix element) of nucleotide b in column i N – number of sequences used to create PFM (= column sum) - pseudocounts (correction for small sample size) p(b) - background frequency of nucleotide b For each matrix element do: A0.58-0.44-0.98-1.21-2.291.22-0.60 -2.96-2.291.62-2.29 -0.72 C-0.44-1.49 -0.301.39-1.210.780.340.25-2.96 -2.291.761.620.46 G0.161.311.44-0.30-0.44-0.17-0.060.340.65-1.211.79-1.49-2.96-2.29-0.64 T-0.60-1.21 0.96-1.21-1.49-0.60-0.30-0.781.73-2.29-1.49-1.84-0.980.23
11
G G G T C A G C A T G G C C A Absolute score of the site=11.57 Scoring putative EREs by scanning the promoter with PWM A0.58-0.44-0.98-1.21-2.291.22-0.60 -2.96-2.291.62-2.29 -0.72 C-0.44-1.49 -0.301.39-1.210.780.340.25-2.96 -2.291.761.620.46 G0.161.311.44-0.30-0.44-0.17-0.060.340.65-1.211.79-1.49-2.96-2.29-0.64 T-0.60-1.21 0.96-1.21-1.49-0.60-0.30-0.781.73-2.29-1.49-1.84-0.980.23
12
Yeast ESR: Biological Validation STRE element Universal stress repressor motif
13
Previous work: “Structure learning” Graphical models (and other methods) –Learn structure of “regulatory network”, “regulatory modules”, etc. –Fit interpretable model to training data –Model small number of genes or clusters of genes –Many computational and statistical challenges; often used for qualitative hypotheses rather than prediction (Segal et al, 2003, 2004) (Pe’er et al. 2001)
14
Signaling networks in a cell
15
Regulator-motif associations in nodes can have different meanings: Need other data to confirm binding relationship between regulator and target (e.g. ChIP-chip) Still, can determine statistically significant regulator- target relationships from regulation program TF M TF P P MpMp P MMpMp Direct binding Indirect effect Co-occurrence Network inference
16
Example: oxygen sensing and regulatory network
17
ChIP-chip: genome-wide protein- DNA binding data, i.e. what promoters are bound by TF? Investigate regulatory network model: use ChIP-chip data in place of motifs (no motif discovery) –Features: (regulator, TF- occupancy) pairs TFP2P2 P1P1 Binding data for regulatory networks
18
Inferring regulatory networks from the combination of expression data and binding data
19
CCNL1 BRF1 ER FOS MYC CEBP XBP1 RXRA HSF2 PNN NRIP1 TXNDC IVNS1ABP BATF HES1 CHAF1B CSDE1 CUTL1 PURB ADAR C140RF43 SP3 DDX20 ELF3 TXNIP PAWR BRIP1 FOXP4 ZNF394 BAZ1B STRAP ASCC3 MKL2 GTF2I RUVBL1 RFC1 ZNF50 0 TTF2 RAB18 ZKSCAN1 MSX2 LASS2 HDAC1 ZBTB41 TBX2 THRAP1 VPS72 TLE3 BHLHB2 ZNF38 ZNF23 9 DNMT1 HIF1A HEY2 An extended ER regulatory network in MCF7 cells
20
Signaling molecules -- Networks Find all SMs that associate as regulators with a particular TF’s ChIP occupancy in ADT features e.g. Hypothesis: Glc7 phosphatase complex interacts with Hsf1 in regulation of Hsf1 targets (Interaction supported in literature) Hsf1Gac1 Gip1 Sds22 Glc7 phosphatase complex TF SM mRNA
21
Input Data Ab initio Motif Discovery Programs Statistical Methods STAMP Matching Results SeqLog PWM P-value Known or novel motifs Bootstrap re-sampling Fisher test Weeder MaMf MEME FASTA file Contact Info Control data (optional) http://motif.bmi.ohio-state.edu/ChIPMotifs/
22
http://motif.bmi-ohio-state.edu/HRTBLDb
23
Software Demo W-ChIPMotifs HRTargetDB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.