Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transcription factor binding sites and gene regulatory network

Similar presentations


Presentation on theme: "Transcription factor binding sites and gene regulatory network"— Presentation transcript:

1 Transcription factor binding sites and gene regulatory network
Victor Jin Department of Biomedical Informatics The Ohio State University

2 Transcription in higher eukaryotes
Gene Expression Chromatin structure Initiation of transcription Processing of the transcript Transport to the cytoplasm mRNA translation mRNA stability Protein activity stability

3 Transcriptional Regulation
Nuclear membrane

4 Transcriptional Regulation
Nuclear membrane Binding site/motif CCG__CCG Genome-wide mRNA transcript data (e.g. microarrays)

5 Learning problems: Transcriptional Regulation
Understand which regulators control which target genes Nuclear membrane Binding site/motif CCG__CCG Discover motifs representing regulatory elements

6 Cluster-first motif discovery
Some common approaches Cluster-first motif discovery Cluster genes by expression profile, annotation, … to find potentially coregulated genes Find overrepresented motifs in promoter sequences of similar genes (algorithms: MEME, Consensus, Gibbs sampler, AlignACE, …) (Spellman et al. 1998)

7 Training data – Features
regulator expression promoter sequence label feature vector

8 What is PWM? Transcription factor binding sites (TFBSs) are usually slightly variable in their sequences. A positional weight matrix (PWM) specifies the probability that you will see a given base at each index position of the motif. N C A G T Con 16 5 2 3 1 42 6 9 7 4 24 44 19 15 11 10 8 34 31 13 18 39 43 14 21 33 29 12 Pos

9 . PWM for ERE Position frequency matrix (PFM)
(also known as raw count matrix) acggcagggTGACCc aGGGCAtcgTGACCc cGGTCGccaGGACCt tGGTCAggcTGGTCt aGGTGGcccTGACCc cTGTCCctcTGACCc aGGCTAcgaTGACGt . cagggagtgTGACCc gagcatgggTGACCa aGGTCAtaacgattt gGAACAgttTGACCc cGGTGAcctTGACCc gGGGCAaagTGACTg Given N sequence fragments of fixed length, one can assemble a position frequency matrix (number of times a particular nucleotide appears at a given position). A normalized PFM, in which each column adds up to a total of one, is a matrix of probabilities for observing each nucleotide at each position. Position weight matrix (PWM) (also known as position-specific scoring matrix) PFM should be converted to log-scale for efficient computational analysis. To eliminate null values before log-conversion, and to correct for small samples of binding sites, a sampling correction, known as pseudocounts, is added to each cell of the PFM.

10 Converting a PFM into a PWM
Position Weight Matrix for ERE Converting a PFM into a PWM For each matrix element do: A 0.58 -0.44 -0.98 -1.21 -2.29 1.22 -0.60 -2.96 1.62 -0.72 C -1.49 -0.30 1.39 0.78 0.34 0.25 1.76 0.46 G 0.16 1.31 1.44 -0.17 -0.06 0.65 1.79 -0.64 T 0.96 -0.78 1.73 -1.84 0.23 – raw count (PFM matrix element) of nucleotide b in column i N – number of sequences used to create PFM (= column sum) - pseudocounts (correction for small sample size) p(b) - background frequency of nucleotide b

11 Scoring putative EREs by scanning the promoter with PWM
G G G T C A G C A T G G C C A A 0.58 -0.44 -0.98 -1.21 -2.29 1.22 -0.60 -2.96 1.62 -0.72 C -1.49 -0.30 1.39 0.78 0.34 0.25 1.76 0.46 G 0.16 1.31 1.44 -0.17 -0.06 0.65 1.79 -0.64 T 0.96 -0.78 1.73 -1.84 0.23 Absolute score of the site =11.57

12 Yeast ESR: Biological Validation
Universal stress repressor motif Xbp1 universal stress repressor, tbp1 tata box, hap1 hypoxia stress, cbf1 cell cycle regulator, gcn4 aa nitrogen stress, STRE element

13 Graphical models (and other methods)
Previous work: “Structure learning” Graphical models (and other methods) Learn structure of “regulatory network”, “regulatory modules”, etc. Fit interpretable model to training data Model small number of genes or clusters of genes Many computational and statistical challenges; often used for qualitative hypotheses rather than prediction (Pe’er et al. 2001) (Segal et al, 2003, 2004)

14 Signaling networks in a cell

15 Network inference Regulator-motif associations in nodes can have different meanings: Need other data to confirm binding relationship between regulator and target (e.g. ChIP-chip) Still, can determine statistically significant regulator-target relationships from regulation program P Mp TF MTF P P M Mp Direct binding Indirect effect Co-occurrence

16 Example: oxygen sensing and regulatory network

17 Binding data for regulatory networks
ChIP-chip: genome-wide protein-DNA binding data, i.e. what promoters are bound by TF? Investigate regulatory network model: use ChIP-chip data in place of motifs (no motif discovery) Features: (regulator, TF-occupancy) pairs P1 P2 TF

18 Inferring regulatory networks from the combination of expression data and binding data

19 An extended ER regulatory network in MCF7 cells
FOS MYC CEBP XBP1 RXRA HSF2 PNN NRIP1 TXNDC IVNS1ABP BATF HES1 CHAF1B CSDE1 CUTL1 PURB ADAR C140RF43 SP3 DDX20 ELF3 TXNIP PAWR BRIP1 FOXP4 ZNF394 BAZ1B STRAP ASCC3 MKL2 GTF2I RUVBL1 RFC1 ZNF500 TTF2 RAB18 ZKSCAN1 MSX2 LASS2 HDAC1 ZBTB41 TBX2 THRAP1 VPS72 TLE3 BHLHB2 ZNF38 ZNF239 DNMT1 HIF1A HEY2 CCNL1 BRF1

20 Glc7 phosphatase complex
Signaling molecules -- Networks Find all SMs that associate as regulators with a particular TF’s ChIP occupancy in ADT features e.g. Hypothesis: Glc7 phosphatase complex interacts with Hsf1 in regulation of Hsf1 targets (Interaction supported in literature) Hsf1 Gac1 Gip1 Sds22 Glc7 phosphatase complex TF SM mRNA

21 http://motif.bmi.ohio-state.edu/ChIPMotifs/ FASTA file
Input Data Ab initio Motif Discovery Programs Statistical Methods STAMP Matching Results SeqLog PWM P-value Known or novel motifs Bootstrap re-sampling Fisher test Weeder MaMf MEME FASTA file Contact Info Control data (optional)

22

23 Software Demo W-ChIPMotifs HRTargetDB


Download ppt "Transcription factor binding sites and gene regulatory network"

Similar presentations


Ads by Google