Bioinformatics Research Group GO Annot Import Markus Krummenacker Bioinformatics Research Group SRI, International Q3 2014 1 1
Preparation The PGDB has to have UniProt dblinks on the proteins mkdir ~/go-import/ Download the UniProt GO file (in GAF format) curl 'ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_uniprot.gz' -o gene_association.goa_uniprot.gz Find the NCBI Taxonomy ID of your PGDB Filter out the GO annot by the NCBI ID: zgrep taxon:85962 gene_association.goa_uniprot.gz |cat > gene_association.goa_uniprot_85962 (The UNIX command above should be all on 1 line.) The NCBI ID in the example was 85962 (HPY) 2 2
Running the GO Annot Import Start pathway-tools –lisp Open the target PGDB At the LISP prompt, run the following command: (update-uniprot-go-annots “~/go-import/”) Expects the GAF input file to have the specific name: gene_association.goa_uniprot_85962 (ends in NCBI ID) 3 3
Workflow Write gene_association.ecocyc-before file Read input GAF and map UniProt IDs to PGDB protein monomer frames Write raw-go-terms-85962.lisp file Run GO-stats Add GO annots to the protein frames Prune redundant EV-COMP annots (less specific parent GO terms) 4 4
Workflow (cont’d) Remove annots that are no longer in UniProt Run GO-stats Import or download citations (PMID etc.) 5 5