Download presentation
Presentation is loading. Please wait.
Published byPaula Lloyd Modified over 6 years ago
1
Department of Genetics • Stanford University School of Medicine
Manually curated and computationally predicted GO annotations at the Saccharomyces Genome Database Eurie L. Hong, Ph.D. Department of Genetics • Stanford University School of Medicine
2
Data from high through-put experiments
Scientific community Data from high through-put experiments Data from traditional experiments Integrated data Analysis tools Genome sequence Talk a little bit about who we are and what we do. We are community database that curates sequence, molecular biology, genetic, and biochemical infofrmation about the budding yeast S. cerevisiae. All the data available at SGD is generated by the scientific community - this includes high-throughput studies, traditional small-scale experiments, and sequencing efforts. These data are incorporated and integrated into SGD. And we provide searches an analysis tools to help the scientific community view the data others have published as well as analyze their own data.
3
CHS6/YJL099W Locus Summary Page
Nomenclature Summary of published data Links to SGD tools and other databases Curated data from published literature Sequence Information Data from high throughput experiments All the data at SGD is centrally organized around a chromosomal feature such as a gene or a telomere or centromere. All the data associated with that feature is displayed on a Locus Summary Page. Here we are looking at the locus summary page for LEU4. You can view the nomenclature, summaries of published data, sequence information, and access other databases from a Locus Summary Page. All these data are curated from the literature and updated as the body of literature expands. I’ll just highlight a few types of data from this page that might be interest to you. Links to other databases
4
Accessing the data via files
ftp://ftp.yeastgenome.org/yeast/ Before I start, I want to emphasize that all our data are publicly available in downloadable files on our ftp site. We also have web interfaces that allow you do download data from searches that I will point out later.
5
Display of GO Annotations
6
Status of GO Annotations at SGD
All protein and RNA gene products have been annotated with GO terms All GO annotations are manually curated from literature (no IEA) 864 genes (13.7% of all genes) Cellular Component 1448 genes (23.0% of all genes) Biological Process 2112 genes (33.6% of all genes) Molecular Function from Genome Snapshot 8/23/2006 Genes without published characterization data The scientific literature describing thte biological role of a gene product is captured with Gene Ontology terms. Gene Ontology is a controlled vocabulary that contains relationships bewteen the terms. This relationship between the terms allows you to do further computational analysis about the knowledge for a gene. GO is also used by other model organism databases. Because these are controlled vocabulary terms with definitions, you know that Flybase’s use of “leucine biosynthesis” is the same as SGD’s use of the term.
7
Sources of Computationally Predicted
GO Annotations InterPro domain matches in S. cerevisiae proteins source: GOA project Integrated analysis of multiple datasets source: publications, external databases
8
CHS6/YJL099W Locus Summary Page
9
Identifying Types of GO Annotations
10
{ { { CHS6/YJL099W GO Annotation Page Core GO Annotations
GO Annotations from Large Scale Experiments { Computationally Predicted GO Annotations
12
{ { { Changes to GO Term Finder Current functionality
Specify background set { Refine annotations used by annotation source or evidence codes
13
Improving GO Annotations
Computationally predicted GO annotations Manually curated GO annotations Computational predictions may indicate publications that were overlooked Review inconsistencies between computationally predicted and manually curated GO annotations to improve mappings and manually curated annotations Review inconsistencies between computationally predicted and manually curated GO annotations to improve ontology
14
Additional Annotations Using Interpro2GO
Information added to genes with no published characterization data Molecular Function 468 genes Biological Process 316 genes Cellular Component 207 genes from gene_association.goa_uniprot 7/2006
15
Preliminary Comparison: Cellular Component Annotations
Other 38% 43% 15% 18% 2% 5946 IEA 9059 IC+IDA+IEP+IGI+IMP+IPI+ISS+NAS+RCA+TAS Interpro2go annotation is ancestor of curated annotation Interpro2go annotation for an unknown Other shared parent term Shared parent is root term Interpro2go annotation matches curated annotation Shared parent is child of root term 4%
16
Summary Currently, all GO annotations for S. cerevisiae gene products are manually curated from literature SGD will incorporate computationally predicted GO annotations that will provide additional information for a gene product’s role in biology Computationally predicted GO annotations will be used to refine and improve manually curated GO annotations at SGD
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.