Download presentation
Presentation is loading. Please wait.
1
CACAO Training ASM-JGI 2012
2
Transferring information to new genomes
Database Lists of genes Known functions of Homologs or subsets New knowledge This figure describes an ideal situation for how we obtain/transfer information to new genes/genomes How well in works depends on the quantity and quality of what we can get out of the databases. The problem is…
3
Curation is rate limiting
Literature Database Biocurators (rate limiting) Datasets
4
CACAO is growing
5
CACAO biodiversity Spring 2012 Annotations
6
CACAO 2 CACAO changes the job of the professionals from primary curation to assessment Growth in CACAO makes assessment rate limiting Solution: Promote CACAO veterans to help with assessment
7
BIOCURATORS
8
The biocurator training …
9
What’s in it for you? We hope you will
learn how we think about protein function gain skills that will help your future career enjoy contributing to a resource used by people all over the world have fun!
10
Annotation Annotation: a note that is made while reading any form of text For scientists, Nucleotide level: Where the genes are in the genome Protein level: What their functions are From Wikipedia
11
Annotation Annotation: a note that is made while reading any form of text For scientists, Nucleotide level: Where the genes are in the genome Protein level: What their functions are From Wikipedia
12
Functional Annotation
Annotation: a note that is made while reading any form of text Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein
13
Functional Annotation
Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein Specific format = GO (Gene Ontology) Annotation
14
GO (Gene Ontology) Annotations
3 aspects (ontologies) for describing protein attributes: 1. Biological Process 2. Molecular Function 3. Cellular Component Controlled vocabulary Everyone uses the same terms Terms have 7 digit IDs that computers can understand Relationships between terms GO:
15
GO:0004347 hexokinase activity
Molecular Function activities or “jobs” of a gene product GO: hexokinase activity GO: Kinase activity From PMID: , rndsystems.com
16
Biological Process a commonly recognized series of events
GO: pathogenesis GO: transcription, DNA dependent GO: cell division From ridge.icu.ac.jp, edtech.clas.pdx.edu, scielosp.org
17
GO:0009274 peptidoglycan-based cell wall
Cellular Component where a gene product acts GO terms: mitochondrion, mitochondrial membrane, mitochondrial matrix, mitochondrial intermembrane space GO: peptidoglycan-based cell wall GO: ribosome GO: mitochondrion From visualphotos.com, epmm.group.shef.ac.uk,
18
Where can you search for GO terms? GONUTS (gowiki.tamu.edu)
23
What do you actually need once you have found the correct term?
GO:
24
Functional Annotation
Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein Specific format = GO (Gene Ontology) Annotation Peer-reviewed paper
25
Finding a scientific paper
Has to be a scientific paper with experimental data in it. (Anything else is a valid reason to challenge!!) No review articles, no books, no textbooks, no wikipedia articles, no class notes… You will need the PMID number Inspiration ok, not suitable for reference section Biocurators can annotate the same paper as previous semesters, but they have to synthesize NOVEL annotations (can’t just remake the same annotations = no points!)
26
Functional Annotation
Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein Specific format = GO (Gene Ontology) Annotation Peer-reviewed paper Protein
27
What can you annotate? Proteins.
PubMed for papers on a specific topic or protein or GO term Search UniProt for something interesting (i.e. allergen) or a protein of interest (i.e. PcnB) Check the references in the paper you are currently reading No matter what, you will need to find the protein’s accession on UniProt ( Use that accession to make a page for that protein on GONUTS ( Add your GO annotations to the protein’s page on GONUTS Can either show this list after they have run out of suggestions or write them in as the students suggest them.
28
Why do you need an accession from UniProt (http://www.uniprot.org)?
* UniProt is not accessible for us to annotate protein on, so we use a community-contributed website, GONUTS. GONUTS needs the UniProt accession so it can pull the existing annotations from UniProt into GONUTS. However, correct and complete annotations are being submitted to the GO Consortium and then get added to UniProt (as part of the GO Consortium) * UniProt is not editable by the community, but GONUTS is. GONUTS can make a page that has the annotations from UniProt for any protein using it’s UniProt accession. Correct & complete annotations at the end of the competition will be submitted back to UniProt.
29
How do you make a new protein page in GONUTS?
2 1 GoPageMaker will: Check if the page exists in GONUTS & take you there if it does. Make a page if it does not exist in GONUTS already & pull all of the annotations from UniProt into a table that you can edit. Make as many protein pages as you would like!
30
Annotations edit table
31
Functional Annotation
Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein Specific format = GO (Gene Ontology) Annotation Peer-reviewed paper Protein
32
Annotations edit table
33
Form for your annotation (when you edit the table)
34
4 REQUIRED parts of EVERY GO annotation
Reference GO Notes (about evidence) Evidence code There are 24 rows in the annotations table on this protein’s page. Each row is a separate annotation. Suzi added the ones with the white background. The annotation shown on the slide is from a different protein’s page. It is merely to show what an annotation is.
35
Summary of Evidence Codes for CACAO
Evidence codes describe the type of work or analysis done by the authors IDA: Inferred from Direct Assay IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context If it’s not one of these 7, your annotation is incorrect!!!
36
Functional Annotation
Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein Specific format = GO (Gene Ontology) Annotation Peer-reviewed paper Protein Evidence code
37
4 REQUIRED parts of EVERY GO annotation
Reference GO Notes (about evidence) Evidence code There are 24 rows in the annotations table on this protein’s page. Each row is a separate annotation. Suzi added the ones with the white background. The annotation shown on the slide is from a different protein’s page. It is merely to show what an annotation is.
38
2 other parts that may rarely be required…
Qualifier With/From
39
How is CACAO scored? Rounds
Points for a complete AND correct annotation (normally 1 week/round, today = 25 mins) 4 necessary parts May be additional parts NOTE: We will take away points if the annotation is not correct when assessed by an experienced CACAO biocurator Challenges are used to steal points for incorrect &/or incomplete annotations (normally 1 week/round, today = 20 mins) Identify a problem Suggest correct alternative Refinements can be entered by any team (during any challenge week) Important part: challenges. These are what make the difference every semester in the final standings!!! also: My rubrics require min 15 correct annotations for 4 rounds of CACAO for excellent grade for Rubric #2 3 ways to get those: Complete & Correct annotations Refine & correct someone else’s annotation Refine an annotation of yours that was refined by another team but not corrected completely.
40
Scoreboard & Challenges
41
Team & Individual Pages
challenge
42
Challenges Enter the reason for your challenge here.
- (i.e. What’s wrong) 2. Provide the fix(es) for it.
43
Annotation discussion (aka argument)
44
UniProt – http://uniprot.org PubMed – http://pubmed.org
Find your protein(s) here (UniProt accession required) PubMed – Find your papers about the protein’s attributes (molecular function, biological process, cellular component) GONUTS – Search for GO terms Make page for your protein on GONUTS (using UniProt accession) Add your annotation to the protein’s Annotation table during first (Annotation) week of any round Review and challenge competitors’ annotations during the second (challenge) week of any round
45
ASM-JGI Competition! You now have 25 mins to:
Use the assigned paper for your group and … Find the correct UniProt accession Make the page for the protein on GONUTS Make at least one annotation You will have 20 mins to challenge other teams’ annotations What fields are wrong & why?!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.