Download presentation
Presentation is loading. Please wait.
Published byJayson Casey Modified over 9 years ago
1
Modeling Promoter and Untranslated Regions in Yeast Abstract T ranscriptional regulation is the primary form of gene regulation in eukaryotes. Approaches to identifying functional regions based on comparative genomics and microarray expression data have recently been applied in promoter and 3'-untranslated region (UTR) sequences in the yeast genome. Here we combine these approaches to construct a robust set of motifs active in the yeast genome. With this set we consider the combinatorial actions of these motifs and apply a linear model to explain observed expression. A deeper understanding of gene regulation in yeast is the first step toward understanding gene regulation and complex disease in higher organisms. Data Set 7 Yeast strains: Saccharomyces cervisiae Saccharomyces bayanus Saccharomyces castellii Saccharomyces kudriavzevii Saccharomyces mikatae Saccharomyces kluyveri Saccharomyces paradoxus 5769 promoters analyzed 1,730,700 DNA nucleotides analyzed per strain Expression data come from heat-shock microarray experiment (Stanford Microarray Database) http://smd.stanford.edu/ Comparative Genomics Expression Analysis Purpose Our goal is to understand how the combinations of various Transcription Factor Binding Sites (TFBS) on a gene affect it’s expression in different experimental conditions. Linear Model To predict the contributions of motifs to a gene’s expression level. Each gene contains zero or more motifs Each motif (assumed to be a TFBS) has an “expression factor” score (+/-) for each experiment The expression of a gene is the sum of the scores of the motifs it contains Calculating the Expression Factor Gene Expression Level Motifs Y01 = 0.456 = M1 + M2 + M3 Y02 = 0.745 = M2 + M4 + M16 Y03 = 0.834 = M1 + M3 + M10 … Using a system of linear equations, we can find the value of unknowns (M1, M2…) using any linear regression technique such as least squares. Transcription factor binding sites are not distributed uniformly in promoter regions The motif CGATGAG most frequently occurs between 60 and 100 nucleotides away from the transcription start site (where the code for a protein begins) Assumption We assume every motif is independent to each other. The same motif is bound by the same transcription factor and has the same affect on the expression. Results 331 motifs are found. Using linear regression, 22 significant active motifs are found by heat-shock expression data. Some motifs and their scores: M66 : CCCCTT(AAGGGG), 1.2460824979780836 M218 : CAGGGG, 1.209783124842816 M259 : CCCTTAA(TTAAGGG), 1.1325379612649848 M264 : TAGGGG(CCCCTA), 0.8571825629506061 … Transcription Factor Binding Sites The YKL182W gene promoter, with highlighted Transcription Factor Binding Sites: AAGTTATAGGGGAAAACTAAAAATATAAGAAAAAAAAAGGTATTGATTGATAAGGAAAAAGAACCAAGGGAAAAAT ATAAAAAAGTACATTGGGCCTTTTCATACTTGTTATCACTTACATTACAAAGAAGAACAAACAACTTTTTTAAACG AATTTTCTTTCTTCCTTTTTCAATTTATTAATTCTTTTTTTCCATACAATTCAAGGTCAAATATATTCTTATATGC TCTTTGAATATTTCTGAAAAATATATAAAGAAAAGAAACTACAAGAACAT Comparative genomics method uses aligned sequences of several closely related species to find patterns that are conserved across multiple genomes. A high rate of conservation implies that the pattern is functional and important. Speceies1: TAATATCAAAATCAATCTCAAAATTACCACCGGTTAGAACTTGG Speceies2: TAATGTCAAAATCAATCTCAAAGTTACCACCGGTTAGAACTTGG Speceies3: TAATATCAAAATCAATCTCAAAATTACCACCAGTTAGAACCTGA Speceies4: TAATATCGAAATCAATCTCAAAATTACCACCGGTTAGAACTTGG Speceies5: TGATGTCAAAATCGATCTCGAAATTACCACCAGTCAGGACTTGG Speceies6: TAATCTCAAAATCAATTTCAAAATTACCACCCGTCATAACTTGA Speceies7: TAATTTCAAAGTCAATTTCAAAGTTACCACCGGTCAAGACTTGA Positional Analysis AARON STONESTROM Division of Biology University of California, San Diego BORIS BABENKO Computer Science and Engineering University of California, San Diego YUJING LIANG Computer Science and Engineering University of California, San Diego ELEAZAR ESKIN Computer Science and Engineering University of California, San Diego JAMAL BENHAMIDA Computer Science and Engineering University of California, San Diego Limitations Finds only transcription factors activated or deactivated in an experimental condition relative to the control. Grouping Motifs Some of the discovered motifs are minor variants or exact reverse compliments of each other. Thus, the motifs were grouped, and each group was assignment a unique id: M0 : CGGTGGCAA, GGTGGCAAG, CGTGGC M1 : AGCTCATCGC, AGCTCATAGC M2 : GCTCATCG, CGATGAGC M3 : AGCTCATCG … Annotating the Genes We can now annotate the genes with the Motif Groups that were discovered: Gene Name : Motif Groups YPR111W: M248, M319, M74 YPR148C : M12, M153, M25 YPR194C : M127, M202, M41 YAL044W-A : M255, M27, M270, M49 Significant Motifs Found Pattern CGGTGGCAA appeared 15 times, and was conserved 15 times, MCS: 100.0 Pattern is conserved on: [YJL001W, YNL155W, YOR052C, YOR259C, YOR260W, YBL022C, YCL043C, YCL042W, YCR092C, YCR093W, YDL148C, YDL147W, YDL070W, YDR427W, YER012W] Pattern AGCTCATCGC appeared 29 times, and was conserved 27 times, MCS: 93.10344827586206 Pattern is conserved on: [YJL109C, YKL191W, YKR024C, YKR025W, YKR081C, YKR082W, YLR014C, YLR015W, YLR106C, YLR107W, YLR336C, YMR049C, YNL248C, YNL247W, YOL125W, YPL094C, YPL093W, YCR057C, YCR072C, YCR087C-A, YDR449C, YHR052W, YHR147C, YHR148W, YHR170W, YIL127C, YIL126W]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.