Presentation is loading. Please wait.

Presentation is loading. Please wait.

Plasmodium falciparum (3D7) - published in 2002. Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.

Similar presentations


Presentation on theme: "Plasmodium falciparum (3D7) - published in 2002. Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version."— Presentation transcript:

1 Plasmodium falciparum (3D7) - published in 2002. Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version 3.0 data release containing 32.436 Mb in 71 contigs. Release of version 4.0, which will contain each of the 36 chromosomes in a single contig, is anticipated within the next few weeks. Trypanosoma cruzi CL-Brener - current assembly contains 8,780 contigs, which can be linked into 5,517 scaffolds, representing a total of 67.557 Mb. T. brucei (TREU927) - latest official release (version 2.0) contained 125 contigs, representing 25.544 Mb, and a new release containing essentially single sequence contigs (with the exception of sub-telomeric repeat regions) for each chromosome is expected within the next month. Plasmodium vivax (Salvador I) - currently at 10X coverage. Genome closure is pending funding, which, if successful, will allow gap closure and finishing by Spring 2005. In the absence of funding, annotation of the genome will begin this autumn. L. infantum (clone JPCM5 (MCAN/ES/98/LLM-877)) - 5x sequencing was completed in October 2003. Annotation of this sequence has not yet begun. Genome sequence update

2 Annotation update With the exception of the P. vivax and L. infantum, these genome sequences have been annotated for protein coding genes. L. Major - manual examination of predictions carried out at both SBRI and WTSI refined the number of likely protein-coding genes to 8021 for the version 3.0 release. Addition of new sequence in version 3.1, has brought the current total number in GeneDB (the “official” repository for LmjF annotation) to 8151. T. cruzi - AutoMAGI used to predict probable protein-coding genes. Due to the complex organization of the T. cruzi genome discussed above, a total of 25,235 genes have been predicted. Automated annotation using a variety of different approaches, such as Blastp, and Pfam analysis has been carried out at TIGR. T. brucei - 13,321 predicted protein-coding genes. It is believed that this number is a significant over-prediction, and the sequencing centers are now working to exclude a relatively large number of small genes, which are unlikely to be protein coding. As is the case for both L. major and T. cruzi, the gene predictions in T. brucei are currently under refinement, based upon comparison between the gene predictions from each of the three organisms. In the case of T. brucei, and T. cruzi, a lower of genes is anticipated in the next version release expected next month, whilst the number is anticipated to be marginally higher for L. major.

3 The importance of these continuing efforts for the SGPP project are clear; the numbers of possible target proteins have increased dramatically over the last year. However, the complement of putative protein coding genes from each of these genomes is still in the process of being refined; the current datasets contain significant numbers of false positives (as well as a smaller, but significant number of false negatives). The sequencing centers involved in these projects are currently engaged in the resolution of a number of these issues in the trypanosomatid genomes, and it is expected that they will be resolved to a great degree within a matter of weeks. Impact on SGPP

4 Progress Targets have been selected from all species under review (with the exception of the newly included L. infantum genome). The table provided below shows the number of proteins, for each species, flowing through each part of the process from downloading from the relevant data source to confirmation as a viable target. SpeciesDownloadedUnique*Confirmed L. major 13,9399,7735,087 T. brucei13,34010,963566 T. cruzi25,07013,9826,014 P. falciparum9,6394,5562,683 P. vivaxN/A 169 * The number of unique genes identified through this methodology is an overestimate. For any given gene, previous versions of the annotation may well differ from later versions due to either changes in the underlying sequence, or alterations in the prediction of the start codon. Such changes are likely to occur for both L. major and T. brucei genes, due to initial annotate of unfinished sequence, followed by sequence alteration, and in T. brucei and P. falciparum due to initial automated annotation, followed by manual inspection. We are currently implementing a system that aims to identify genes whose annotations have altered, in order to ensure that the failure of the resultant protein to express is correctly identified as being due to initial sequence errors.

5 Selection for both soluble proteins and integral membrane proteins; TM prediction algorithms used to differentiate between these two classes, and also to allow cleavage of predicted signal peptides and targeting signals. Target selection generally applies an amino acid length threshold of 800 amino acids (although shorter boundaries have been used and are currently being used. Parsing of large proteins from baker lab (David K). Selection of Plasmodium targets based on interactions identified by the Fields lab (Marissa/Doug). Multi-leish approach (Chris M/Peter M). In order to concentrate on proteins that are likely to represent novel folds - PDB search to quantify identity and similarity of the target sequence to proteins of known structure (David K). In order to strengthen our drug target strategy, we have also imposed restraints upon the degree of identity to human proteins (currently set at 50%) beyond which proteins are excluded from further analysis (Frank). Targeting of known and putative enzymes (COGs, EC numbers, BRENDA) (David K). Targetting of proteins with possible medical relevance through selection of proteins with sequence identity to proteins under patent (Wes/Fred). Identification of a set of P. vivax targets, homologous to previously attempted P. falciparum proteins (David K). Selection criteria used to date


Download ppt "Plasmodium falciparum (3D7) - published in 2002. Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version."

Similar presentations


Ads by Google