Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cis-regulatory evolution of duplicate genes in yeasts

Similar presentations


Presentation on theme: "Cis-regulatory evolution of duplicate genes in yeasts"— Presentation transcript:

1 Cis-regulatory evolution of duplicate genes in yeasts
Gaurav Moghe January-February 2009

2 Background S. cerevisiae S. bayanus S. castellii C. glabrata
S. kluyveri K. lactis E. gossypii

3 Goal Scer Scas Klac Pre-WGD species Ago

4 Sequences used for the study
Genome sequences downloaded from GenBank and SGD ORF sequences for Post-WGD species downloaded from SGD. Upstream sequences extracted using the location information of the ORF Upstream sequences for Pre-WGD species downloaded from RSAT PWMs obtained for 124 TFs from a study by MacIsaac et al, 2006

5 Motif Searches Search genome using MAST 106 million sites
Are MacIsaac sites being predicted? Each site has a confidence (p-value) associated with it Generate p-value threshold based on these sites Filter the other predictions using these thresholds 1.4 million sites Map the filtered predictions to intergenic regions

6 Motif Searches PWM used by MAST to scan genomes
Are MacIsaac sites being predicted?

7 Nature of PWMs Split PWMs into 6 groups, based on their length and alphabet Length Class Example >6 Best large gCATGTGAA <6 Best small GATAA Better large tGCTGg.. OK large .tCGG.YsWATGGRr OK small wGACkC Poor large wwwwsyGGGG

8 Does size/nature have to do anything with False Positives?
PHO2 AYTAAr OK small RCS1 tgCACCy Better large SWI6 rACGCG Best small MSN2 mAGGGG. Best large SUT1 .gCsGgg OK large SWI5 tGCTGg.. SKN7 kCyrgsCc Poor large YAP5 ARrCAT CST6 tgCATTT. SOK2 .cAGGmAm No 10 TFs account for ~1.1 million sites out of 1.4 million sites No relation between size/nature and False Positives No good for many other TFs

9 Then… We decided to use only the MacIsaac sites for searching across species Map the MacIsaac sites onto the intergenic region Look at their loss patterns in other species in orthologous promoters

10 Orthologous genes Orthologous genes obtained through Yeast Genome Browser (YGOB) The gene names of YGOB do not correspond to gene names provided by SGD for the sensu stricto species BLASTp to find out which YGOB annotation corresponds to which SGD annotation Some genes are being lost in this process

11 Then… We decided to use only the MacIsaac sites for searching across species Map the MacIsaac sites onto the intergenic region Look at their loss patterns in other species in orthologous promoters

12 Using MAST Ideal case Also observed for some TFs Scer Macisaac sites
Sbay MAST predictions Scer: Saccharomyces cerevisiae Sbay: Sacharomyces bayanus

13 Using MAST Scer Macisaac sites Sbay MAST predictions
Scer: Saccharomyces cerevisiae Sbay: Sacharomyces bayanus

14 Using Phylogeny based methods
Many programs available PhyloCon Morph Phylogibbs FootPrinter Gibbs Sampler All for motif discovery, not motif search using phylogenetic principles

15 Using Phylogeny based methods
Conserved Regulatory Elements anchored Alignment (CONREAL) Monkey (Mike Eisen) PhyloScan Conditional Shadowing via Multi-resolution Evolutionary Trees (CSMET)

16

17

18 Plans for the next month
Test MONKEY/PhyloScan on the intergenic elements Estimate the False positive/False negative rate under the specified parameters, based on known TFBS

19 Novel RNA genes project
January-February 2009

20 Download EST sequences corresponding to PUTs
737301 Map them to the genome using GMAP (L>50bp,Cov>70,Idt>90%) 605624 1353 Yes? Map to AT RNA genes 7357 Map to protein-coding regions No? Map to other AT features BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences 2500 3260 No match? 2431 BLASTn against Repetitive Sequence Database No match? Coding Index to double-verify absence of protein-like seq 1893 No match? BLASTx against all known proteins to verify absence of any protein in the sequences 1867 No match?

21 BLASTn against all RNA family sequences in RFAM
1867 BLASTn against all RNA family sequences in RFAM 1837 30 Manual filtering on NCBI by Andy giving ~13% False Positive Rate ~1600 novel ESTs Conservation in lyrata using GMAP RNA structure prediction Expression conservation Wet lab confirmation Substitution rate Tiling array 817 at 60% coverage and 75% Idt, nhits<=3 Shan helping out with this


Download ppt "Cis-regulatory evolution of duplicate genes in yeasts"

Similar presentations


Ads by Google