Building a community for genome and proteome annotation

Building a community for genome and proteome annotation
Claire O’Donovan

Workshop aims: March 2015 at Georgetown University, Washington USA (funded by NIH through UniProt grant) To discuss a shared vision for the future of annotation in the genome era with a special focus on protein functional prediction. To have a stimulating discussion about what we all do, what we would like to achieve in the future and how to build a community.

Participants Institute Name University of California
Dr Patricia Babbitt J. Craig Venter Institute Dr Granger Sutton Broad Institute Dr Gustavo Cerqueira Joint Genome Institute Dr Nikos Kyrpides Texas A&M University Eric Rasche The University of Maryland Dr Michelle Giglio Indiana University (CAFA SIG) Dr Predrag Radivojac Miami University (CAFA SIG) Dr Iddo Friedberg University of Florida Dr Svetlana Gerdes SRI International Dr Peter Karp NCBI Dr Tatiana Tatusova EMBL-EBI Dr Maria Martin University of Southern California Dr Huaiyu Mi

Current status This is our challenge!
Huge advances in genome sequencing technology, quality and standards But sequence function! This is our challenge!

Aim to go beyond the name
Ontologies Nomenclature Functional Annotations Sequence features

Presented lightening talks about our perspectives/opinions Broke into groups to brainstorm on Our vision of the future of annotation What are the barriers? What are the solutions?

Solutions / What is needed
Establishing the community. Bringing together the experimentalists and experts in annotation pipelines, database providers, standards developers and computational researchers is critical to successfully and accurately annotate genomes and proteomes. By doing so, we ensure we have the experimental data and domain knowledge needed to inform the computational pipelines and databases and to create a forum for the establishment and exchange of data, code and best practices.

Defining comprehensive annotation requirements and standards for automatic annotation systems to deliver consistent annotation across our resources and available to the community to use. Need to develop and make available Standard Operating Procedures (SOPs) for the annotation pipelines to enable both accurate interpretation and reproducibility. Imperative for the community to move away from the protein name as the primary piece of annotation information for the gene/protein product as simply insufficient to capture the relevant information available for each gene/protein

The importance of a central resource of curated and experimental data and computational methods Currently there is no exhaustive database of experimentally determined protein function which inhibits the proper validation and generalization via computational methods. Sharing code and annotation pipelines in a central, publicly supported platform with well defined modules and interfaces would enable smaller resources or individual researchers to contribute new methods and utilise others’ efforts and expertise

Interaction with the journals to ensure standardized data submission to enable computational effective data gathering While natural language processing is improving, the only reliable way to capture information computationally is via structured data. The authors are best positioned and motivated to provide this data at the time of publication and therefore it is critical to engage with the publishers to ensure this is part of the submission process.

Engaging with funding agencies to develop new proposals for annotation involving both experimentalists and computational scientists and that they mandate openness eg publications software, and data sharing need to communicate better what annotation actually is 1) gene calling 2) developing a controlled vocabulary for function 3) capturing/propagating functional annotation 4) discovering the function of uncharacterized genes/proteins Building the collaborations to enable such comprehensive proposals to deliver on all aspects

Engaging with the experimentalists in the development of experimental assays which would address the questions we need answered The annotation/predicting communities know there is a lot of known unknowns and it would be of great benefit to work together with the experimentalists to direct research to address these experimental gaps

Engaging with cutting edge computational scientists and hardcoding methods identified as useful for our community We need YOUR expertise for the development of better computational approaches for functional prediction and improving text mining to extract data from the experimental literature So get involved!!

Building a community for genome and proteome annotation

Similar presentations

Presentation on theme: "Building a community for genome and proteome annotation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building a community for genome and proteome annotation

Similar presentations

Presentation on theme: "Building a community for genome and proteome annotation"— Presentation transcript:

Similar presentations

About project

Feedback