Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.

Slides:



Advertisements
Similar presentations
On line (DNA and amino acid) Sequence Information Lecture 7.
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
GO-based tools for functional modeling GO Workshop 3-6 August 2010.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Proteins and Protein Function Charles Yan Spring 2006.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
GO Enrichment analysis COST Functional Modeling Workshop April, Helsinki.
An introduction to using the AmiGO Gene Ontology tool.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Managing Data Modeling GO Workshop 3-6 August 2010.
Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.
Adding GO GO Workshop 3-6 August GOanna results and GOanna2ga 2. gene association files 3. getting GO for your dataset 4. adding more GO (introduction)
Part I: Identifying sequences with … Speaker : S. Gaj Date
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Strategies for functional modeling TAMU GO Workshop 17 May 2010.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
GO-based tools for functional modeling TAMU GO Workshop 17 May 2010.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Overview of Bioinformatics 1 Module Denis Manley..
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
9/10/06 GO Users Meeting 2006 Seattle, Washington The AgBase GO Annotation Tools Susan Bridges 1,3, Fiona McCarthy 2,3, Nan Wang 1,3, G. Bryce Magee 1,3,
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
The Protein Identifier Cross-Reference (PICR) service.
GO based data analysis Iowa State Workshop 11 June 2009.
AgBase Shane Burgess, Fiona McCarthy Mississippi State University.
Copyright OpenHelix. No use or reproduction without express written consent1.
Prioritization of Avian GO Annotation , , Chicken ,06949,5163.4Rat ,69664, Mouse ,83036, Human.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Getting GO annotation for your dataset
CACAO Training ASM-JGI 2012.
Strategies for functional modeling
Introduction to the Gene Ontology
Functional Annotation of the Horse Genome
Strategy for working on your own data sets.
PIR: Protein Information Resource
ID Mapping tools: Converting Accessions between Databases
GO Annotation from different sources
Searching the NCBI Databases
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Welcome - webinar instructions
Presentation transcript:

Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009

1. GO for your species  GOProfiler: summarizing the available GO 2. GO browsers  QuickGO from EBI  AmiGO from the GO Consortium 3. gene association files 4. getting GO for your dataset 5. adding more GO 6. requesting GO 7. GO based tools for functional modeling

1. GO for your species

GOProfiler GOProfiler allows you get an overview of what GO annotation exists for the species you are interested in.

Number of proteins is based upon UniProtKB records for these species. Species with only IEA annotations do not have an active GO annotation project  GO provided automatically by EBI GOA Project.

2. GO Browsers

Use GO Browsers for: searching for GO terms searching for gene product annotation filtering sets of annotations and downloading results creating/using GO slims

GO Browsers QuickGO Browser (EBI GOA Project)   Can search by GO Term or by UniProt ID  Includes IEA annotations AmiGO Browser (GO Consortium Project)  bin/amigo/go.cgi bin/amigo/go.cgi  Can search by GO Term or by UniProt ID  Does not include IEA annotations More information about these tools is available from the online workshop resources.

3. gene association files

The gene association (ga) file standard file format used to capture GO annotation data tab-delimited file containing 15* fields of information:  Information about the gene product (database, accession, name, symbol, synonyms, species) information about the function:  GO ID, ontology, reference, evidence, qualifiers, context (with/from) data about the functional annotation  date, annotator * 2 additional fields will soon be added to capture information about isoforms and other ontologies.

(additional column added to this example)

gene product information

metadata: when & who

function information

Gene association files GO Consortium ga files  many organism specific files  also includes EBI GOA files EBI GOA ga files  UniProt file contains GO annotation for all species represented in UniProtKB AgBase ga files  organism specific files  AgBase GOC file – submitted to GO Consortium & EBI GOA  AgBase Community file – GO annotations not yet submitted or not supported  all files are quality checked

4. Finding GO for your dataset

The AgBase GO annotation tools can be used separately or can be combined to rapidly provide an annotation file for functional modeling tools.

GORetriever Allows you to get GO annotations for a specific set of gene products. Accepts a text file of UniProt accessions or IDs or gi numbers. Returns GO annotations, list of accessions that had no GO and a GO Summary file.

GORetriever Results

save as text file For GOSlimViewer

GORetriever Results

But what about IDs not supported by GORetriever?

5. Adding GO to your dataset

only returns existing GO only accepts limited accession types GOanna does a Blast search against existing GO annotated products. allows you to quickly transfer GO to gene products where they have similar sequences (ISS) accepts fasta files

GOanna

GOanna Results

query IDs are hyperlinked to BLAST data (files must be in the same directory)

*WHAT IS A GOOD ALIGNMENT? 1. Manually inspect alignments and delete any lines where there is not a good alignment*. 2. Add this additional annotation to the annotations from GORetriever.

GOanna2ga New to AgBase: an online script to convert your GOanna file to a gene association file format. Allows you to add manually checked GOanna annotations to a GORetriever file. Link is available from the workshop resources.

6. Requesting GO

7. GO based tools for biological modeling

GOSlimViewer: summarizing results

response to stimulus amino acid and derivative metabolic process transport behavior cell differentiation metabolic process regulation of biological process cell communication nucleobase, nucleoside, nucleotide and nucleic acid metabolic process cell death cell motility macromolecule metabolic process multicellular organismal development catabolic process biological_process “process unknown” “function unknown” “component unknown” ??

However…. many of these tools do not support agricultural species the tools have different computing requirements Tools for GO analysis of gene expression/microarray data A list of these tools that can be used for agricultural species is available on the workshop website at the Expression analysis tools at the GO consortium website link.

Evaluating GO tools Some criteria for evaluating GO Tools: 1. Does it include my species of interest (or do I have to “humanize” my list)? 2. What does it require to set up (computer usage/online) 3. What was the source for the GO (primary or secondary) and when was it last updated? 4. Does it report the GO evidence codes (and is IEA included)? 5. Does it report which of my gene products has no GO? 6. Does it report both over/under represented GO groups and how does it evaluate this? 7. Does it allow me to add my own GO annotations? 8. Does it represent my results in a way that facilitates discovery?