Webinar Wheat Data Interoperability guidelines Esther Dzalé Yeumo Richard Fulss.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

DRIVER Step One towards a Pan-European Digital Repository Infrastructure Norbert Lossau Bielefeld University, Germany Scientific coordinator of the Project.
IUFRO International Union of Forest Research Organizations Eero Mikkola The Increasing Importance of Metadata in Forest Information Gathering NEFIS Symposium.
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
6 The Australian Centre for Plant Functional Genomics Pty Ltd Case-studies on past EU-Australian research collaborations.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Abstract The goal of the Conifer Translational Genomics Network (CTGN) project is to provide tree breeders across the United States with new tools to enhance.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Agricultural Biotechnology Network for Regional Collaboration and Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
Wheat Data Interoperability Esther DZALE YEUMO KABORE Richard FULSS.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Development of the Generation Challenge Program Ontology for Crops Elizabeth Arnaud (Bioversity International) and Rosemary Shrestha (CRIL-CIMMYT), Richard.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Near East Rural & Agricultural Knowledge and Information Network - NERAKIN Food and Agriculture Organization of the United Nations Near East and North.
Supporting open data & models in the agri-food sector: Experiences from the RDA Agriculture Data IG & Wheat Data Interoperability WG V. Protonotarios (Agro-Know),
European Life Sciences Infrastructure for Biological Information ELIXIR
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
RDA Wheat Data Interoperability Cookbook and last developments 9 th March 2015, San Diego.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
U.S. Dept. Agriculture Agricultural Research Service Rob Griesbach Technology Transfer Coordinator
Introducing NRSP10 Database Infrastructure for Specialty Crops Computer Applications in Horticulture/Teaching Methods Workshop ASHS Annual Conference 2015.
15/11/2011EVA Minerva Jerusalem1 Linked Heritage : Coordination of standards and technologies for the enrichment of Europeana Marie-Véronique Leroi Ministry.
Gramene’s Outreach Program. Outreach Components Workshops Website Improvements / Additions Public Announcements High School Outreach Collaborators and.
Phenotyping Clare Coyne & Melanie Harrison-Dunn, curators.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Wheat Data Interoperability. 2  Endorsed in March 2014  Focus:  Improve/reach semantic interoperability of Wheat data  The WG will focus first on.
1 Improving Statistics for Food Security, Sustainable Agriculture and Rural Development – Action Plan for Africa THE RESEARCH COMPONENT OF THE IMPLEMENTATION.
The Plant Ontology Consortium website: Contact Information for deliverables Lincoln Stein,
American Oat Workers Conference, Fargo 2006 Towards a Global Strategy for the Conservation of Oat Genetic Resources Federal Centre of Breeding Research.
The KOS interoperability in aquatic science field through mapping processes Carmen Reverté Reverté Aquatic Ecosystems Documentation Center. IRTA. (Sant.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
The Plant Ontology Consortium Lincoln Stein 1, Susan McCouch 2, Elizabeth Kellogg 3, Seung Rhee 4, Pankaj Jaiswal 2, Doreen Ware 1, Peter Stevens 5 1 Cold.
Crop Ontology towards the semantic integration of open plant trait data Elizabeth Arnaud, Luca Matteis, Rosemary Shrestha, Milko Skofic,
Gramene: Interactions with NSF Project on Molecular and Functional Diversity in the Maize Genome Maize PIs (Doebley, Buckler, Fulton, Gaut, Goodman, Holland,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
TRPGR Gramene: A Platform for Comparative Plant Genomics Doreen Ware USDA ARS Cold Spring Harbor Laboratory Pankaj Jaiswal Oregon State University Joshua.
A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T 1 G20 – 12th May 2011 An International Research Initiative for Wheat Improvement.
The Coordination of Agricultural Research in the Mediterranean and its e-infrastructure tool Presented by WP1 Leader Mipaaf-IT, Annamaria Marzetti & Serenella.
Wheat Data Interoperability Esther DZALE YEUMO KABORE Richard FULSS.
Third Project cycle of the Benefit- sharing Fund Window 3 Co-development and transfer of technologies projects
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Gene Bank Biodiversity for Wheat Prebreeding
Phenotype And Trait Ontology (PATO) and plant phenotypes
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes: N America: 1,180 Europe: 386 Asia: 235 Africa: 6 Oceania: 81 S America:
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
Wheat Data Interoperability WG outputs. 2 The Wheat Initiative  Created in 2011 following endorsement by G20 Agriculture Ministries to improve food security.
European Life Sciences Infrastructure for Biological Information EGI 2015, Lisbon, 18 May 2015 Rafael C Jimenez, ELIXIR CTO ELIXIR.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Data Coordinating Center University of Washington Department of Biostatistics Elizabeth Brown, ScD Siiri Bennett, MD.
Cyril Pommier et al. / Feedback from the RDA and WheatIS recommendations for Wheat Data Interoperability Adoption of the Wheat Data Interoperability Guidelines.
Rice Data Interoperability Working Group Updates
Diversity Seek (DivSeek)
GEOSS Component and Service Registry (CSR)
C. Pommier, E. Dzalé Yeumo Kabore, E. Arnaud, P. Larmande, M. Alaux, H
DataNet Collaboration
Wheat Data Interoperability Esther DZALE YEUMO KABORE Richard FULSS
knowledge organization for a food secure world
Doron Goldfarb & Yann LE FRANC
Functional Annotation of the Horse Genome
2. An overview of SDMX (What is SDMX? Part I)
Where AIPL Fits In Agricultural Research Service (ARS) is the main research arm of USDA (8,000 employees with 2,000 scientists at >100 locations) Beltsville.
M-H Pinard-van der Laan
For more information contact:
Cultivating Semantics for Data in Agriculture and Nutrition
Presentation transcript:

Webinar Wheat Data Interoperability guidelines Esther Dzalé Yeumo Richard Fulss

Let us get to know each other  What is your area of research?  Agronomy  Breeding  Biochemistry  Bioinformatics  Functional genetics  Physiology  Population genetics  Quantitative genetics  Statistical genetics  Structural genomics  Other  What is your position  Manager  Data manager  Researcher  Software developer  Student  Other

3 The Wheat Initiative  Created in 2011 following endorsement by G20 Agriculture Ministries to improve food security  A framework to identify synergies and facilitate collaborations for wheat improvement at the international level  The Wheat Initiative members  Countries: Argentina, Australia, Brazil, Canada, China, France, Germany, Hungary, India, Ireland, Italy, Japan, Spain, Turkey, UK, USA  International organizations: CIMMYT, ICARDA  Private companies: Arvalis, Bayer CropScience, Florimond Desprez V&F, KWS UK, Limagrain, Monsanto Company, RAGT 2n Saateen Union Research, Syngenta Crop Protection

4 The Wheat Data Interoperability WG Contributors: Alaux Michael (INRA, France), Aubin Sophie (INRA, France), Arnaud Elizabeth (Bioversity, France), Baumann Ute (Adelaide Uni, Australia), Buche Patrice (INRA, France), Cooper Laurel (Planteome, USA), Fulss Richard (CIMMYT, Mexico), Hologne Odile (INRA, France), Laporte Marie-Angélique (Bioversity, France), Larmand Pierre (IRD, France), Letellier Thomas (INRA, France), Lucas Hélène (INRA, France), Pommier Cyril (INRA, France), Protonotarios Vassilis (Agro-Know, Greece), Quesneville Hadi (INRA, France), Shrestha Rosemary (INRA, France), Subirats Imma (FAO of the United Nations, Italy), Aravind Venkatesan (IBC, France), Whan Alex (CSIRO, Australia) Co-chairs: Esther Dzalé Yeumo Kaboré (INRA, France), Richard Allan Fulss (CIMMYT, Mexico)  Aims: contribute to the improvement of Wheat related data interoperability by  Building a common interoperability framework (metadata, data formats and vocabularies)  Providing guidelines for describing, representing and linking Wheat related data Contributors Sponsors

5 Question  According to you, which data types among those proposed hereafter appear to be the most important for wheat research in the next 5 years?  SNPs  Genomic annotations  Phenotypes  Genetic maps  physical maps  Germplasm  Other

6 The data types covered in the guidelines

7 Surveys Landscape of Wheat related standards and their use by the community Comprehensive overview of Wheat related ontologies and vocabularies Workshops Recommendations Mappings between different data formats Actions to conduct in order to improve the current level of Wheat related data interoperability Interoperability use cases Implementation Interactive cookbook: recommendations + guidelines A repository of Wheat related linked vocabularies (Bioportal) The methodology

Use case: sources of resistance to stem rust (UG99) and tolerance to drought conditions in bread wheat Top management or breeders What are the sources of resistance to stem rust (UG99) and tolerance to drought conditions in bread wheat? Data scientists, bioinformaticians How do I extract information from all these databases? Do I have well- documented metadata to make queries? How do I link the data to get smarter information? Data manager, data provider I want to make my data findable, reusable and linkable to other data: What ontologies and metadata elements are commonly used to describe the types of data I am dealing with? What data formats could I use to share my data Where could I deposit my data? Gene sequences Phenotypic data Passport information Phenotypic data Low density markers Data are Dispersed Heretogeneous Abundant

9  Description  Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity);  Access to RNASeq data (genes that are not expressed in roots may be irrelevant);  mapping of wheat genes and information on their function based on literature  Implied data types  Annotated genes (Gene Ontology, PFam, and other functional annotation)  Implementation in AgroLD Use case: Identification of wheat genes that control root growth

10 g

11

12

 What do we mean by interoperable data?  Use shared vocabularies to name things and express relations between them  Use common data formats to represent the data  Use persistent and unique identifiers to identify things and allow their linkage over domains and information systems  Benefits  Efficient information integration / retrieval.  Capability to answer complex questions  Easier holistic understanding of a domain Interoperability is one of the keys

 Guidelines ( )  Data exchange formats  Example: VCF (Variant Call Format) for sequence variation data, GFF3 for genome annotation data, etc.  Data description best practices  Consistent use of ontologies, consistent use of external database cross references  Data sharing best practices  Share data matrices along with relevant metadata (example: trait along with method, units and scales or environmental ones)  Useful tools and use cases that highlight data formats and vocabularies issues  A portal of wheat related ontologies and vocabularies ( )  Allows the access to the ontologies and vocabularies through APIs.  A prototype  Implementation of use cases of wheat data integration within the AgroLD (Agronomic Linked Data) tool: The deliverables

Benefits for many target users As a data producer or manager Easily conform to the well-recognized data repositories and facilitate the deposit of your data within these repositories; Share common meanings of the words you utilize to describe your data and make your data more machine-readable and computable Contribute to foster the development of smarter search tools and make your data more visible and discoverable As a wheat related information system or tool developer Basing your tool or information system on the recommended data formats and vocabularies will make it easier to integrate data from various data sources, deliver smarter outputs for a wider audience As a wheat related ontology developer Share your ontologies through the WDI wheat ontologies portal and make them more visible to the community Reuse or link your ontologies to existing concepts and terms in wheat related ontologies to enrich them, make them more visible and in some cases save you time.

17 How You Can Endorse the guidelines on data formats  For legacy data  Please provide your data in at least one of the recommended data formats even if, for some reasons, you need to also keep them in other non-recommended formats  For future developments  Please consider using the recommended data formats from the beginning.  Example: provide your sequence variation data in the latest VCF file format  Please refer to the WDI guidelines for precise recommendations on each data typeWDI guidelines

18 How You Can Endorse the data description and sharing best practices  Describe your data following the WDI recommendations and with the recommended vocabularies.  Examples:  For genome annotation data in GFF3 format, use of ontologies for functional annotation in column 9, such as, Gene Ontology and Sequence Ontology.  For observation Variables (including trait and environment variables), use existing variables, listed in the following vocabularies and ontologies :  Wheat crop ontology Wheat crop ontology  INRA Wheat Ontology  Biorefinery ontology Biorefinery ontology  XEO, XEML Environment Ontology XEO, XEML Environment Ontology

Question  Which of the following vocabularies are you familiar with? 1.AGROVOC 2.Biorefinery 3.CAB Thesaurus (CABT) 4.Cell Ontology (CL) 5.Chemical Entities of Biological Interest (ChEBI) 6.Crop Ontology (CO) 7.Crop Research Ontology – part of Crop Ontology (CO_715) 8.Environment Ontology (ENVO) 9.Experimental Factor Ontology (EFO) 10.Feature Annotation Location Description Ontology (FALDO) 11.NAL Thesaurus (NALT) 12.Phenotype And Trait Ontology (PATO) 13.Plant Experimental Conditions Ontology (Plant Environment Ontology, EO, may be changing to PECO) 14.Plant Ontology (PO) 15.Plant Trait Ontology (TO) 16.Population and Community Ontology (PCO) 17.Protein Ontology (PRO) 18.Sequence Ontology (SO) 19.Variation Ontology (VariO) 20.Wheat Ontology INRA (Wheat_Ontology) 21.Wheat Anatomy and Development Ontology – part of Crop Ontology (CO_121) 22.Wheat Trait Ontology: Embedded in Crop Ontology (CO_321) 23.Wheat Phenotype (phenotypes and traits in Wheat)

Question  Which of the following vocabularies are you using? 1.AGROVOC 2.Biorefinery 3.CAB Thesaurus (CABT) 4.Cell Ontology (CL) 5.Chemical Entities of Biological Interest (ChEBI) 6.Crop Ontology (CO) 7.Crop Research Ontology – part of Crop Ontology (CO_715) 8.Environment Ontology (ENVO) 9.Experimental Factor Ontology (EFO) 10.Feature Annotation Location Description Ontology (FALDO) 11.NAL Thesaurus (NALT) 12.Phenotype And Trait Ontology (PATO) 13.Plant Experimental Conditions Ontology (Plant Environment Ontology, EO, may be changing to PECO) 14.Plant Ontology (PO) 15.Plant Trait Ontology (TO) 16.Population and Community Ontology (PCO) 17.Protein Ontology (PRO) 18.Sequence Ontology (SO) 19.Variation Ontology (VariO) 20.Wheat Ontology INRA (Wheat_Ontology) 21.Wheat Anatomy and Development Ontology – part of Crop Ontology (CO_121) 22.Wheat Trait Ontology: Embedded in Crop Ontology (CO_321) 23.Wheat Phenotype (phenotypes and traits in Wheat)

21 How You Can Endorse the WDI wheat related ontologies portal?  Share your wheat related ontologies within the WDI slice in AgroportalWDI slice in Agroportal  Before developing a new ontology  Make sure there is not an existing one within the WDI slice in Agroportal that covers your needs  When developing a new ontology  Please reuse or link to exiting concepts and terms in the ontologies within the WDI slice in Agroportal whenever possible.WDI slice in Agroportal  Whenever possible  Please align your ontologies to the existing ones within the WDI slice in Agroportal and share the mapping resultsWDI slice in Agroportal

22 Endorsements/Adopters LaboratoryContact NIAB, Professor Mario Caccamo Head of Crop Bioinformatics USDA ARS and Cold Spring Harbor Laboratory, Doreen Ware Adjunct Associate Professor Ph.D., Ohio State University Paul Kersey EMBL European Bioinformatics Institute, Paul Kersey Team Leader Non-vertebrate Genomics Australian Center for Plant Functional Genomics, Dr Baumann, Ute Bioinformatics Leader The Genome Analysis Center, Robert Davey Data Infrastructure & Algorithms Group Leader Munich Information Center for Protein Sequences (MIPS), Helmholtz Center Munich, muenchen.de/ muenchen.de/ Dr. Klaus Mayer Research Director MIPS INRA URGI, Michael Alaux, Deputy leader of "Information System and data integration" team Cyril Pommier, Deputy leader, Information System and Data integration team, Phenotype thematic leader Rothamsted Research, Christopher Rawlings Head of Department Computational & Systems Biology Harpenden James Hutton Institute, David Marshall Information and Computational Sciences The James Hutton Institute CIMMYT Wheat program, Allan James, Head of Knowledge Management Rosemary Shrestha, Data Coordinator

23 Acknowledgements WDI WG members: Fulss Richard, co-chair (CIMMYT), Alaux Michael (INRA), Aubin Sophie (INRA), Arnaud Elizabeth (Bioversity), Baumann Ute (Adelaide University), Buche Patrice (INRA), Cooper Laurel (Planteome), Hologne Odile (INRA), Laporte Marie-Angélique (Bioversity), Larmande Pierre (IRD), Letellier Thomas (INRA), Mohellibi Nacer (INRA) Pommier Cyril (INRA), Protonotarios Vassilis (Agro-Know), Shrestha Rosemary (CIMMYT), Subirats Imma (FAO of the United Nations), Aravind Venkatesan (IBC), Whan Alex (CSIRO), Jonquet Clément (Lirmm, Agroportal) And Lucas Hélène (INRA, International Wheat Initiative), Quesneville Hadi (INRA, co-chair WheatIS EWG)

Thank you!