From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of Digital Biology, Mississippi State University
From Functional Genomics to Physiological Model 1. A user’s guide to the Gene Ontology (GO) 2. Finding GO for farm animal species 3. Adding GO to your dataset 4. GO based tools for biological modeling 5. Examples: using GO for biological modeling
Presentation available at AgBase Websites available as handout
1. A User’s Guide to GO
What is the Gene Ontology? Emily Dimmer, GOA EBI: “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” assign functions to gene products at different levels, depending on how much is known about a gene product is used for a diverse range of species structured to be queried at different levels, eg: find all the chicken gene products in the genome that are involved in signal transduction zoom in on all the receptor tyrosine kinases human readable GO function has a digital tag to allow computational analysis of large datasets
GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO: fatty acid biosynthetic process TAS GO: mitochondrial electron transport, NADH to ubiquinone TAS GO: lipid biosynthetic process IEA Cellular Component (CC or C) GO: mitochondrial matrix IDA GO: mitochondrial respiratory chain complex I IDA GO: mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO: fatty acid binding IDA GO: NADH dehydrogenase (ubiquinone) activity TAS GO: oxidoreductase activity TAS GO: acyl carrier activity IEA
GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO: fatty acid biosynthetic process TAS GO: mitochondrial electron transport, NADH to ubiquinone TAS GO: lipid biosynthetic process IEA Cellular Component (CC or C) GO: mitochondrial matrix IDA GO: mitochondrial respiratory chain complex I IDA GO: mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO: fatty acid binding IDA GO: NADH dehydrogenase (ubiquinone) activity TAS GO: oxidoreductase activity TAS GO: acyl carrier activity IEA aspect or ontology GO:ID (unique) GO term name GO evidence code
GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO: fatty acid biosynthetic process TAS GO: mitochondrial electron transport, NADH to ubiquinone TAS GO: lipid biosynthetic process IEA Cellular Component (CC or C) GO: mitochondrial matrix IDA GO: mitochondrial respiratory chain complex I IDA GO: mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO: fatty acid binding IDA GO: NADH dehydrogenase (ubiquinone) activity TAS GO: oxidoreductase activity TAS GO: acyl carrier activity IEA GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by computational analysis RCA - inferred from reviewed computational analysis ISS - inferred from sequence or structural similarity IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available
Unknown Function vs No GO ND – no data Biocurators have tried to add GO but there is no functional data available Previously: “process_unknown”, “function_unknown”, “component_unknown” Now: “biological process”, “molecular function”, “cellular component” No annotations (including no “ND”): biocurators have not annotated
2. Finding GO for Farm Animals
GO Browsers QuickGO Browser (EBI GOA Project) Can search by GO Term or by UniProt ID Includes IEA annotations AmiGO Browser (GO Consortium Project) Can search by GO Term or by UniProt ID Does not include IEA annotations
Getting GO includes farm animals
Getting GO filter filter
Getting GO
3. Adding GO to your dataset
GO analysis of array data Probe data is linked to gene product data gene, cDNA, ESTs IDs For some arrays, gene product data has corresponding GO data available from vendor (updated?) Not all gene products will have GO annotation will not be included in modeling Need to get the maximum amount of GO data to do biological modeling
Example: Netaffx
Secondary source of GO annotation
GORetriever + many more
GORetriever
GORetriever Results
save as text file For GOSlimViewer
GORetriever Results
But what about IDs not supported by GORetriever?
GOanna
GOanna Results
query IDs are hyperlinked to BLAST data (files must be in the same directory)
*WHAT IS A GOOD ALIGNMENT? If there is a good alignment* to a protein with GO transfer GO to your record If there is not a good alignment or the record doesn’t have GO literature
good alignment add to GO summary file (tab-delimited text file containing ID, GO:ID, aspect)
Contact AgBase to request GO annotation of specific gene products.
GOSlimViewer: summarizing results
GOSlimViewer results
response to stimulus amino acid and derivative metabolic process transport behavior cell differentiation metabolic process regulation of biological process cell communication nucleobase, nucleoside, nucleotide and nucleic acid metabolic process cell death cell motility macromolecule metabolic process multicellular organismal development catabolic process biological_process
response to stimulus amino acid and derivative metabolic process transport behavior cell differentiation metabolic process regulation of biological process cell communication nucleobase, nucleoside, nucleotide and nucleic acid metabolic process cell death cell motility macromolecule metabolic process multicellular organismal development catabolic process biological_process “process unknown” “function unknown” “component unknown” ??
B-cellsStroma immune response apoptosis cell-cell signaling Looking at function, not genes Pie Graphs – relative proportions
GOModeler: quantitative, hypothesis-driven modeling. Coming soon (contact AgBase) GOModeler
McCarthy et al “AgBase: a functional genomics resource for agriculture.” BMC Genomics Sep 8;7:229.
4. GO based tools for biological modeling
However…. many of these tools do not support farm animal species the tools have different computing requirements may be difficult to determine how up-to-date the GO annotations are… Need to evaluate tools for your system.
Evaluating GO tools Some criteria for evaluating GO Tools: 1. Does it include my species of interest (or do I have to “humanize” my list)? 2. What does it require to set up (computer usage/online) 3. What was the source for the GO (primary or secondary) and when was it last updated? 4. Does it report the GO evidence codes (and is IEA included)? 5. Does it report which of my gene products has no GO? 6. Does it report both over/under represented GO groups and how does it evaluate this? 7. Does it allow me to add my own GO annotations? 8. Does it represent my results in a way that facilitates discovery?
5. Using GO for biological modeling
Using GO for biological modeling: hypothesis generating hypothesis driven