Download presentation
Presentation is loading. Please wait.
Published byHoward Clark Modified over 9 years ago
1
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010
2
What’s the problem? 1.Huge bottleneck = finding a protein’s function when given a protein sequence 1.Incomplete, inaccurate, or inconsistent annotations are difficult to work with and can propagate 1.No good way to measure the accuracy of an annotation predictor
3
What is the CAFA Challenge?
4
What are Gene Ontology (GO) terms? GO = controlled vocabulary of “gene ontologies” Cover three domains: ▫Cellular component ▫Molecular function ▫Biological process Hierarchy: ▫Broad/general (e.g. “catalytic activity”) ▫Specific (e.g. “leukotriene-C4-synthase activity”)
5
Outline of Our Approach CAFA targets (FASTA sequences) CAFA targets (FASTA sequences) GO ids for each CAFA target SMURF? Betawrap Pro? Other Secondary Structure Predictor? BLAST PFAM
6
Pfam: Protein Family Database Collection of protein families represented by: ▫Multiple sequence alignments ▫Hidden Markov Models Two sections of Pfam: ▫A: high-quality, manually-curated ▫B: large, automatically- generated Sample Multiple Sequence Alignment Sample Hidden Markov Model
7
BLAST: Basic Local Align’t Search Tool Goal: find homologous (i.e. derived from a common ancester) sequences from a database Various BLAST programs: ▫blastp = query: protein, database: protein ▫blastn = query: nucleotide, database: nucleotide ▫blastx = query: translated nucleotide, database: protein ▫tblastn = query: protein, database: translated nucleotide ▫tblastx = query: translated nucleotide, database: translated nucleotide
8
SMURF: Structural Motifs Using Random Fields Determines whether a protein sequence contains one of the following super secondary structures: ▫6-bladed propeller ▫7-bladed propeller ▫8-bladed propeller ▫Double blades (i.e. 6-6, 6-7,6-8…) Developed at Tufts! Some propeller functions: ▫Often WD40 repeat –protein-protein interaction ▫Signaling, transcription, cell cycle Smurf! 7-bladed propeller
9
Final Database Structure cafa_targets cafa_id uniprot_id gi_access_id blast_results cafa_id pdb_id refseq_id e_value_score pfam_results cafa_id pfam_id smurf_results cafa_id template_id p_value_score pdb_id go_id refseq_id uniprot_id go_id pfam_id go_id template_id go_id go_results cafa_id go_id source confidence INPUT RESULTS MAPPINGOUTPUT
10
Final Results Statistics PDB BLAST SMURFPfam 789 69 12 19 4 3,445 1,356 Distribution of sequence hits by method Of 8,904 unknown sequences… 4,265 had at least one hit in PDB BLAST 4,824 had at least one hit in Pfam 104 had at least one hit in SMURF In total, 5,694 unique sequences had at least one hit, a 63.9% success
11
Example Result T38114 MDLDMNGGNKRVFQRLGGGSNRPTTDSNQKVCFHWRAGRCNRYPCPYLHRELPGPGSGPVAASSNKRVADESGFAGPSHR RGPGFSGTANNWGRFGGNRTVTKTEKLCKFWVDGNCPYGDKCRYLHCWSKGDSFSLLTQLDGHQKVVTGIALPSGSDKLY TASKDETVRIWDCASGQCTGVLNLGGEVGCIISEGPWLLVGMPNLVKAWNIQNNADLSLNGPVGQVYSLVVGTDLLFAGT QDGSILVWRYNSTTSCFDPAASLLGHTLAVVSLYVGANRLYSGAMDNSIKVWSLDNLQCIQTLTEHTSVVMSLICWDQFL LSCSLDNTVKIWAATEGGNLEVTYTHKEEYGVLALCGVHDAEAKPVLLCSCNDNSLHLYDLPSFTERGKILAKQEIRSIQ IGPGGIFFTGDGSGQVKVWKWSTESTPILS BLAST: matches with PDB structures 2OVP, 3MKS, 2CNX, 1P22, 1NEX, 3N0E ▫Transcription, mitosis, methylation, protein binding Pfam: match to family PF00642 ▫Zinc ion binding, nucleic acid binding SMURF: match to 7-bladed β-propeller template ▫WD domain (protein binding)
12
Possible Future Directions Improving functional annotation for β- propellers identified by SMURF ▫Analyze training set of propeller proteins with known function to build probabilistic model of protein function based on propeller type Addition of other structural prediction tools for motifs with known function ▫G-coupled receptors, membrane bound proteins Expansion of BLAST search to include full nr database
13
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.