Advanced PGDB Editing: Gene Ontology (GO) Terms Ingrid M. Keseler Bioinformatics Research Group SRI International keseler@ai.sri.com Presented by Peter E. Midford http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/curation/Curation-of-Regulatory-Information-and-GO-Terms/Editing&ImportingGo.ppt
Gene Ontology (GO) Terms
Motivation: Why GO Terms? For example: Standardization of annotation helps others Data mining across genomes Genome annotation by similarity (e.g. via InterPro, Pfam, TIGRFAM, COG mappings) It helps you: Omics data clustering Cross-checking with different annotation source
A Word (Or Two) About GO Learn what you can about using GO Surf the geneontology.org web site Attend a GO Annotation Camp Ask questions on the GO mailing lists Request new GO terms if appropriate Useful for everybody Have input when it counts A new GO database can be incorporated into Pathway Tools; request help with setting up the process Computational GO term assignments may be available for your genome via UniProt
GO Classification Editor Accessible via the Protein Editor Expand/contract, select/deselect by clicking on +/- and the actual terms Selected items move to “Selections” section Search feature: can search by name/substring or GO id (in the full format only, e.g. “GO:0007165”, not just the number) For example, search for “arginine” yields many options Click an option to highlight in hierarchy Hovering over a term in the hierarchy brings up its definition in the middle panel Must still click on entry in hierarchy to select the term for annotating the protein
Caveats Implementation of GO in Pathway Tools is incomplete, e.g. Only one qualifier, “with”, is implemented No annotation extensions If you are interested in a more complete implementation of the GO ontology and curation capabilities, please let us know
Importing GO Terms via UniProt IDs (1) Download annotation file from geneontology.org: http://geneontology.org/page/download-annotations You probably want the UniProt [multispecies] file Use the ftp: address and either an ftp app (e.g., FileZilla, CyberDuck) or the command line curl tool to download the uniport file e.g., curl ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniport_all.gaf.gz > goa_uniport_all.gaf.gz
Importing Go Terms via UniProt IDs (2) Unzip the file and use grep, or use zgrep directly, to extract annotations for your organism by taxID zgrep ‘taxon:85962’ goa_uniport_all.gaf.gz > goa_uniport_85962 Beware that these are large files and may take a long time to download (e.g. 7 hrs), a long time to unzip, and a long time to grep through (i.e. don’t try this here)
Importing GO Terms May at some point be made into a GUI tool within Pathway Tools; at this time, it is a command line operation At the lisp prompt, with your database opened: (update-uniprot-go-annots ”filepath”) Where filepath is replaced with e.g. /Users/midford/Documents/demo/goa_uniport_85962 Some log files are written into the same directory; do some sanity checking of the files and the database before saving Import citations
Advanced PGDB Editing: Regulation Ingrid M. Keseler Bioinformatics Research Group SRI International keseler@ai.sri.com Presented by Peter E. Midford
Motivation: Why Regulation? For example: Genome and regulatory overview Global perspective Omics data Data sets for promoter prediction etc.
Omics Viewer: Regulatory Overview Data from J Bacteriol. 2010 Feb;192(3):870-82. A comprehensive proteomics and transcriptomics analysis of Bacillus subtilis salt stress adaptation.
Defining a New Transcription Unit Gene > New Operon Key elements – gene names in order PTools will prompt you for a citation for the TU Specify promoter Can use absolute or relative position of transcription start site PTools will calculate the other value for you PTools will prompt you for a citation for the TSS Specify sigma factor (if appropriate) It may be necessary to first classify sigma factors under |Sigma-Factors|
Adding Transcription Factor Binding Sites Click on TU name – Edit > Create Regulatory Interaction Select type of regulatory interaction Can put in a protein name, or select a defined TF Indicate whether it activates, represses or both Define relative distance from transcription start site Draws DNA footprint from feature defined in TF Can edit TF binding sites by clicking on site name Edit > Regulatory Interaction Editor Can add summaries and citations This builds the transcriptional regulatory network
More Regulatory Interactions Attenuation Regulation of translation RNA-mediated Protein-mediated Small molecule-mediated Regulated protein or mRNA degradation (planned) If you have suggestions for types of regulation you would like to represent, or for improvements on what is there, please let us know. Tools for genome-scale datasets?