Download presentation
Presentation is loading. Please wait.
Published byThomasina Matthews Modified over 9 years ago
1
March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali
2
Goal AGCCA Regulatory regions Motif – Binding site???
3
Why Bother? Gene expression regulation Co-regulation UNDERSTAND
4
Difficulties Multiple factors for a single gene Multiple factors for a single gene Variability in binding sites Variability in binding sites The nature of variability is NOT well understood The nature of variability is NOT well understood Usually Transitions Usually Transitions Insertions and deletions are uncommon Insertions and deletions are uncommon Location, location, location… Location, location, location…
5
EMSA – Electrophoretic mobility shift assay EMSA – Electrophoretic mobility shift assay Nuclease protection assay Nuclease protection assay Experimental methods NOT ENOUGH!!!!!
6
So, what can we do? Find conserved sequences in regulation regions Find conserved sequences in regulation regions 1. Define what you want to find 2. Define what is a good result 3. Decide how to find it…
7
Global optimum Global optimum Enumerative methods Enumerative methods Going over ALL possibilities Taking the best one Principal Methods: Disadvantage : Limited to small search spaces Advantage : Certainty
8
Principal Methods: Disadvantage : You can never know… Advantage : Basically good results, faster Local optimum Local optimum Gibbs sampling, AlignACE Gibbs sampling, AlignACE Start somewhere (arbitrary) Next step direction – proportional to what we “gain” from it We can get anywhere with some probability
9
Identifying motifs Identifying motifs Expression patterns Expression patterns Phylogenetic footprinting Phylogenetic footprinting Identifying networks Identifying networks Common motifs in expression clusters Common motifs in expression clusters Combinatorial analysis Combinatorial analysis Articles Overview
10
Discovery of novel trancription factor binding sites by statistical overrepresentation S. Sinha, M. Tompa Identify binding sites in yeast Goal: Use sets of co- regulated genes Identify over- represented upstream sequences Enumeration YMF algorithm
11
What constitutes a motif? (tailored for S.cerevisiae) In S.cerevisiae typically 6-10 conserved bases – The motif In S.cerevisiae typically 6-10 conserved bases – The motif Spacers varying in length (1-11bp) Spacers varying in length (1-11bp) Usually located in the middle Usually located in the middle Taken from SCPD – S.cerevisiae promoter database ACCNNNNNNGTT
12
Z-score – Motif over-representation Z-score – Motif over-representation P max (X) – Probability of Z score >= X P max (X) – Probability of Z score >= X How do we measure motifs?
13
YMF algorithm Yeast Motif Finder INPUT: A set of promoter regions Motif length - l modest values Maximum number of spacers allowed - w Transition Matrix 611
14
YMF algorithm Post Processing: FindExplanators: artificial overrepresentation W-score Co-expression score TCACGCT (motif) CACGCTA (artifact)
15
Experiments Validate YMF results Validate YMF results Running YMF on regulons with known binding sites (SCPD) Running YMF on regulons with known binding sites (SCPD) Run YMF on MIPS catalogs Run YMF on MIPS catalogs (MIPS - Munich Information center for Protein Sequences) Functional Functional Mutant phenotype Mutant phenotype
16
Validation
17
New binding sites or false positives?
18
A novel site candidate
19
Further research Validation of novel binding sites and transcription factors Validation of novel binding sites and transcription factors Modification of the algorithm to be applicable for other organisms Modification of the algorithm to be applicable for other organisms
20
Systematic determination of genetic network architecture Saeed Tavazoie, Jason D. Hughes, Michael J. Campbell, Raymond J. Cho, George M. Church Cluster by expression patterns Identify upstream sequence patterns Identify co- regulated networks of genes in yeast Goal: AlignACE Aligns Nucleic Acid Conserved Elements
21
Clusters Cluster – a group of genes with a similar expression pattern Cluster – a group of genes with a similar expression pattern Cluster’s members Cluster’s members Tend to participate in common processes Tend to participate in common processes Tend to be co-regulated Tend to be co-regulated
22
Clusters 10 -54
23
Identifying motifs Using AlignACE 18 motifs from 12 clusters were found. Using AlignACE 18 motifs from 12 clusters were found. 7 of the found motifs were identified experimentally 7 of the found motifs were identified experimentally And what about the others????
24
Scanning for more binding sites Once a significant motif was found the whole genome was scanned for it Once a significant motif was found the whole genome was scanned for it Most motifs were cluster specific Most motifs were cluster specific
25
Why so few motifs? Too stringent rules for defining a “significant” motif Too stringent rules for defining a “significant” motif Post transcriptional regulation (mRNA stability) Post transcriptional regulation (mRNA stability) Some clusters represent “noise” Some clusters represent “noise”
26
“Tightness” “Tightness” of a cluster “Tightness” of a cluster how close are the cluster members of a particular cluster to its mean how close are the cluster members of a particular cluster to its mean A strong correlation between the presence of significant motifs and the “tightness” of a cluster A strong correlation between the presence of significant motifs and the “tightness” of a cluster
27
Things to remember Discovering regulons and motifs using expression based clustering Discovering regulons and motifs using expression based clustering Minimal biases Minimal biases Validation as a methodology for new organisms Validation as a methodology for new organisms Identifying expected cis-regulatory motif EACH TIME!! Identifying expected cis-regulatory motif EACH TIME!!
28
Identifying regulatory networks by combinatorial analysis of promoter elements by Yitzhak Pilpel, Priya Sudarsanam & George M.Church Understand transcriptional network Goals: Identify motif combinations affecting expression patterns in yeast
29
Basic definitions Expression coherence score- Expression coherence score- Synergistic motifs – Synergistic motifs – EC(a&b) > EC(a\b), EC(b\a)
30
Methods: A database of motifs Gene sets Calculating EC score Significant synergistic combinations Visualizing the transcriptional network Understanding the effect of individual and combination of motifs
31
GMC GMC – Gene Motif Combination. GMC – Gene Motif Combination. Motif numbers: (m1, m2, m3, m4, m5) = (1,0,1,1,0) (m1, m2, m3, m4, m5) = (1,0,1,1,0) Synergistic motif combination- Synergistic motif combination- EC(n motifs) > max(EC(n-1 motifs)) GMC – what is it good for? GMC – what is it good for?
32
Combinograms ClusteringGMCs
33
Combinograms – what is it good for? They help visualizing the “single motif - specific expression pattern” connection They help visualizing the “single motif - specific expression pattern” connection They also show which motif is more critical in determining expression pattern. They also show which motif is more critical in determining expression pattern.
34
Motif synergy map visualizing transcription networks
35
conclusion The combinogram importance The combinogram importance The motif synergy map importance The motif synergy map importance
36
Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes Lee Ann McCue, William Thompson, C.Steven Carmack, Michael P.Ryan, Jun S.Liu, Victoria Derbyshire and Charles E.Lawrence Goals:Identifying novel TF binding sites in E.coli Describing transcription regulatory network Finding orthologs Identify upstream sequence patterns Local optimum Gibbs sampling algorithm
37
Methods: Data set Gibbs sampling algorithm Motif One E.coli gene and orthologs MAP score – a measure of overrepresentation of motif
38
Applying the method in a small scale – Validation Choosing 190 E.coli genes. Choosing 190 E.coli genes. Creating 184 data sets. Creating 184 data sets. Running Gibbs sampling algorithm. Running Gibbs sampling algorithm. More than 67% success in the prediction for the most probable motif. More than 67% success in the prediction for the most probable motif.
39
Motif Model
40
Identification of the YijC binding sites A strongly predicted site was upstream of the fabA, fabB and yqfA genes. A strongly predicted site was upstream of the fabA, fabB and yqfA genes. Chromatography – identifying the factor. Chromatography – identifying the factor.
41
Identifying the YijC binding sites and predicting gene function Mass spectrometry identification – YijC Mass spectrometry identification – YijC Predicting a function for yqfA. Predicting a function for yqfA. weight fabA fabB yqfA fadB
42
Applying the method genome wide Choosing 2113 E.coli ORFs. Choosing 2113 E.coli ORFs. For 2097 a TF-binding site was predicted. For 2097 a TF-binding site was predicted.
43
Map scores- ortholog distribution Study set Full set
44
Adding binding sites for known TFs Building a TF binding site model for known TFs. Building a TF binding site model for known TFs. Scanning E.coli upstream regions. Scanning E.coli upstream regions. 187 new probable sites. 187 new probable sites.
45
Building a regulatory network Required steps: Required steps: Identifying motif models Identifying motif models Clustering the models Problem: Problem: Specifity Specifity
46
Conclusion What have we gained so far? What have we gained so far? A better prediction of gene function. A better prediction of gene function. New possibilities for identification of TF binding site and the TF which binds them!!! New possibilities for identification of TF binding site and the TF which binds them!!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.