Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin
Objective Use peptide libraries to validate the in silico prediction of gene models Mapping peptides on a translated genome sequence = provides « correct frames of translation » Assumption : « if a peptide protein is detected, then there must be a gene that encodes it »
Methodology (hardware) Urediniospores (3729) Protein extraction 1D SDS-PAGE Gel slicing (64) Trypsin digestion LC-MS/MS Bioinformatics Waters MassPREP station LTQ ThermoElectron ExtractionSlicing Digestion Elution Peptide MS/MS data acquisition
Methodology (Bioinformatic) Spectral identification by sequence database searching Statistical validation of peptide identifications Protein databases built from… 1 - Comparison of results from both db 2- Comparison of peptides and GM (validation/correction of genome annotations) 6 frames translation of the genome Gene catalog (16694 GM) Mascot Sequest Mascot Sequest
MLP proteomic results so far MS/MS spectra obtained from the total proteins Gene catalog 6-frame translation Mascot + Sequest Only Mascot 352 unique peptides obtained from the 6-frames translation db have do not match GM of the Gene catalog Unique peptides: False discovery rate below 1.6%
Peptide frequency distribution on GM No. peptide/gene model No. gene model Mean 9 peptides covering 134 AA / GM The peptides represent assignments for nearly 10% of the Gene catalog e.g GM
Automated classification of peptides with no hit (352) on the Gene catalog 5’ extension of a predicted GM –If peptide (s) located within the 1000 bp upstream the predicted GM start codon 3’ extension of a predicted GM –If peptide (s) located within the 1000 bp downstream the predicted GM stop codon 5’ and 3’ extension of a predicted GM –If peptides located within the 1000 bp upstream the start codon and within the 1000 bp downstream the predicted GM stop codon Internal extension of a predicted GM –If peptide (s) located in the GM New GM –If no predicted GM in the vicinity of the peptide (s)
Corrections-Additions to the Gene catalog ModificationNumber of GM 5’ extension44 Internal exon extension31 3’ extension22 5’ and 3’ extension5 New GM73 Total172 Mapping of the peptides with no hit on the genome allowed the following modifications
Manual curation- Internal extension
EuGene’s prediction is OK
Manual curation- New GM
Summary – Peptide-assisted genome annotation –Validated 10 % of the predicted GM –Corrected/found > 170 GM According the manual curation accomplished so far, it appears that EuGene had predicted most of the corrected/found > 170 GM With little resources (6000 $ worth of materials and services, and a few weeks worth of labour) our proteomic analysis:
A quantitative proteomic approach (iTRAQ) will be used to compare urediniospores, germinated urediniospores and haustoria protein complexes Perspectives Analysing the Sequest output obtained from the 6-frames translation 5051 peptides identified with Mascot (352 with no hits on the Gene catalog) Sequest ?
Available material Our set of peptide spectra from urediniospores proteins is available to validate new GM predictions The peptides GFF files will be made available to the Melampsora community
Finding the peptides on the different model prediction sets Gene Catalog ,9% EuGene ,9% Genewise ,9% Genewise1Plus ,4% fgenesh1_pg ,2% fgenesh2_pg ,7% Do we need to perform a new spectra search on the whole model prediction sets ? Total GMModel prediction setGM validated %