Download presentation
Presentation is loading. Please wait.
Published byKianna Estabrook Modified over 9 years ago
1
Periodic clusters
2
Non periodic clusters
3
That was only the beginning…
4
The human cell cycle G1-Phase S-Phase G2-Phase M-Phase
5
The proliferation cluster genes are cell cycle periodic 5 10 15 20 25 30 35 40 45 4 3 2 1 0 -2 -3 -4 G2/M G1/S CHR Samples Gene Expression Proportion All genes Proliferation genes
6
200 150 100 50 TSS NFY E2F ELK1 CDE CHR The cell cycle motifs are enriched among the periodic genes Not in the cluster, mutated in cancer Tabach et al. Mol Sys Biol 2005
7
Potential regulatory motifs in 3’ UTRs Finding 3’ UTRs elements associated with high/low transcript stability (in yeast) AAGCTTCCCCTACAAC Entire genome
8
Time/tissues Expression Clustering Motif finding Diagnosing motifs using expression Reverse the inference flow
9
Once we reverse the inference order we can Enumerate and score all possible k-mer motifs Examine the effect of “mutations” on motifs Examine the effect of motif location within promoter Examine the effect of motif combinations, distances within a combination More?
10
…But the correlation between gene cluster and motifs is imprecise in both directions: there are genes in the cluster without the motif and many genes with the motif do not respond. If gene control is multifactorial, groups of genes defined by a common motif will not be mutually disjointed partitioning the data into disjoint clusters will cause loss of information.
11
A k-mer enumeration method: score every possible k-mer for an association with expression level Ag is expression level of gene g C is a basal expression level (same for all gs) The integer Nμg equals the number of occurrences of motif μ in gene g M a set of motifs Fμ is the increase/decrease in expression level caused by the presence of motif μ (same for all gs)
12
Time Expression level Time Expression level EC score = 0.05 EC score = 0.5 ScanACE (Hughes et al.) Motifs characterization through Expression Coherence (EC)
13
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 12 34 EC1=0EC2=0.66 EC3=0.2EC4=0.2 Threshold distance, D Expression coherence score, intuition
14
Interaction of motifs Expression level Only M1Only M2 Expression level M1 AND M2 G2 M1 M2
15
Synergistic motifs A combination of two motifs is called ‘synergistic’ if the expression coherence score of the genes that have the two motifs is significantly higher than the scores of the genes that have either of the motifs SFFMcm1
16
A global map of combinatorial expression control *High connectivity *Hubs *Alternative partners in various conditions Pilpel et al. Nature Genetics 2001
17
Deduced network Properties. 0 0.5 1 -0.5 -1 0.2 0.4 0.6 0.8 G1 G2 Mbp1 Ndt80 Ume6 MCM1' MCB MSE URS1 SCB MCM1' SFF' Correlation ExpressionCoherence Fkh1 Swi4 Sufficiency Necessity Ho et al. Nature. 2002 TF-TF interaction Hierarchy
18
Detect the effect of mutations in a motif
19
ATG Distance and orientation of motifs affect expression profiles
20
Some typical expression patterns
21
A Bayesian approach (conditional probability) Xi could “1” to denote denote: The presences of motif m It’s distance from TSS is < N It’s on the coding strand It neighbors another motif m’ Or “0” otherwise e i = being expressed in patter i
22
Example: two rRNA processing motifs The two motifs Work together The two motifs’ orientation matters
23
The procedure Given that P(N|D)=P(N)*P(D|N) / P(D): Search in the space of possible Ns to look for a one that maximizes the above probability Impossible to enumerate all possible networks Use cross validation: partition the data into 5 gene sets, learn the rules based on all but one and test based on the left-out, each time.
24
For example: what does it take to belong to expression patter (4)? Need to have RRPE and PAC If PAC is not within 140 bps from ATG, but RRPE is within 240 bps then the probability of pattern 4 is 22% If PAC is within 140 and RRPE is within 240 bp then 100% chance
25
Inferring various logical conditions (“gates”) on motif combinations
26
The Bayesian network predicts very accurately expression profiles
27
Can make useful predictions in worm
30
The modern synthetic approach
31
Motif discovery from evolutionary conservation data
32
S. Cerevisiae S. mikatae, S. kudriavzevii, S. bayanus). S. castellii S. Kluyveri Their intergenic sequences average 59 to 67% identity to their S. cerevisiae orthologs in global Alignments S. castellii and S. Kluyveri ~40% identity to Cerevisae
33
Nucleotide conservation in promoters is highest close to the TSS TATA-containing genes All genes
34
? ? ? ? ?
35
A set of discovered motifs
36
NATURE | VOL 434 | 17 MARCH 2005
37
The data Examined intergenic regions of human mouse rate and dog ~18,000 genes “Promoters”: 4kb centered on TSS 3UTRs based on RNA annotations 64 Mb, and 15 Mb in total respectively Negative control: Introns of ~120 Mb % of alignable sequence: promoters: 51% (44% upstream and 58% downstream of the TSS), 3’ UTR: 73%, Introns:34%, Entire genome: 28%
38
The phylogenetic trees Questions: How would addition of species affect analyses? What if the sequences were not only mammalian?
39
An example: a known binding site of Err- in the GABPA promoter Questions: What is the “meaning” of the other conserved positions?
40
Discovery of new motifs: exhaustive enumeration of all 6-mers
42
Targets of new motifs showed defined expression patterns
43
Motifs often show clear positional bias – close to TSS
44
Same methods to look for motifs in 3’ UTRs reveals strand-specific motifs
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.