1111 The Generation Challenge Programme (GCP) Platform for Crop Research Richard Bruskiewich and the rest of …

Slides:



Advertisements
Similar presentations
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Advertisements

16/11/ IRS-II: A Framework and Infrastructure for Semantic Web Services Motta, Domingue, Cabral, Gaspari Presenter: Emilia Cimpian.
CBio Meeting, March 2-3, 2006 CHISEL Group Dept of Computer Science University of Victoria, Canada Visualization of ontologies and data annotations.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
ICIS - The International Crop Information System - A Workshop Report
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Creating An Allele Index For NPGS: Bioinformatic Issues Edward Buckler USDA-ARS at Cornell University, Ithaca, NY.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Course Instructor: Aisha Azeem
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Development of the Generation Challenge Program Ontology for Crops Elizabeth Arnaud (Bioversity International) and Rosemary Shrestha (CRIL-CIMMYT), Richard.
Software Engineering Muhammad Fahad Khan
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
UML - Development Process 1 Software Development Process Using UML (2)
GRIN-Global Project the global plant genebank information management system.
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Open Collaboration in an Institutional Context T. Metz, M.J. Mendoza, R. Valerio International Rice Research Institute © IRRI, 2007 This work is published.
Adapting Legacy Computational Software for XMSF 1 © 2003 White & Pullen, GMU03F-SIW-112 Adapting Legacy Computational Software for XMSF Elizabeth L. White.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Geospatial Systems Architecture Todd Bacastow. GIS Evolution
Gramene’s Outreach Program. Outreach Components Workshops Website Improvements / Additions Public Announcements High School Outreach Collaborators and.
Molecular marker data and their impact on gene bank management Chris Richards NCGRP, Fort Collins, CO Curator Workshop, Atlanta Georgia.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
TDWG 2006, Missouri, U.S.A. Exchange of germplasm datasets with PyWrapper/BioCASE October 16, 2006 TDWG annual Meeting 2006 Missouri Botanical Garden St.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Graham McLaren and Arllet M. Portugal Generation Challenge Program
Diversity Bioinformatics Terry Casstevens Institute for Genomic Diversity, Cornell University GMOD Meeting at NESCent Durham, NC – June 29-30, 2006.
Biodiversity research and informatics in Bioversity International TDWG 2009 meeting ‘e-knowledge about Biodiversity and Agriculture’ Montpellier, 9-13.
Chapter 3 Object Oriented Systems and Open GIS. Objectives of the Chapter Establish place of O-O in OpenGIS cover basics of O-O emphasise design issues.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Digesting the Genome Glut Promoting the Use and Extension of GMOD To Emerging Model Organisms David Clements 1 Brian Osborne 2 Hilmar Lapp 1 Xianhua Liu.
Gramene: Interactions with NSF Project on Molecular and Functional Diversity in the Maize Genome Maize PIs (Doebley, Buckler, Fulton, Gaut, Goodman, Holland,
TRPGR Gramene: A Platform for Comparative Plant Genomics Doreen Ware USDA ARS Cold Spring Harbor Laboratory Pankaj Jaiswal Oregon State University Joshua.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Jake F. Weltzin United States Geological Survey USA National Phenology Network Integrating phenology data across spatial and temporal scales.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Data Management for Integrated Breeding
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
C R O P S T O I N D U S T R Y WP 2 Tasks: 2-1 to 2-3 Task leader: Dimitra Milioni Crops2Industry “Non-food Crops-to-Industry schemes in EU27”
Presentation Title Goes Here …presentation subtitle. International Crop Information System : Its Development and Rice & Wheat Implementation Arllet M.
Third Project cycle of the Benefit- sharing Fund Window 3 Co-development and transfer of technologies projects
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Molecular Breeding Platform Relationship with ICIS Graham McLaren ICIS Developers’ Workshop March 2nd 2010, Perth, Australia.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
Dan Born, Peter Bradbury, Ed Buckler Payan Canaran, Terry Casstevens, Dallas Kroon, Dave Matthews, Susan McCouch, Junjian Ni, Isaak Yosief Tecle, Doreen.
An initiative of the CGIAR Generation Challenge Programme (GCP) Breeding Management System Overview of functionalities Photo credit: Isagani Serrano/IRRI.
Data Input Component of CropGen International Consultancy for GCP Robert Koebner PhD Paul Brennan MAgrSC, PhD Consultants in Plant Breeding, Application.
Cyril Pommier et al. / Feedback from the RDA and WheatIS recommendations for Wheat Data Interoperability Adoption of the Wheat Data Interoperability Guidelines.
Introduction to Data Management Arllet M. Portugal Integrated Breeding Platform Breeding Management System Intensive Workshop on Data Management Jan. 26,
Graham McLaren GCP21-II, Kampala, Uganda 19 June 2012
Behavior and Phenotype in GMOD Natural Diversity in GMOD
Grid Portal Services IeSE (the Integrated e-Science Environment)
the global plant genebank information management system
Presentation transcript:

1111 The Generation Challenge Programme (GCP) Platform for Crop Research Richard Bruskiewich and the rest of …

…The GCP SP4 team and Contributors IRRI-CIMMYT Crop Research Informatics Laboratory Graham McLaren Thomas Metz Martin Senger Ramil Mauleon Mylah Anacleto Michael Jonathan Mendoza Victor Jun Ulat Arllet Portugal Ryan Alamban Lord Hendrix Barboza Jeffrey Detras Kevin Manansala Jeffrey Morales Barry Peralta Rowena Valerio Nelzo Ereful CIP: Reinhard Simon Edwin Rojas ICRISAT: Jayashree Balaji ICARDA: Akinnola Akintunde NCGR: Andrew Farmer Gary Schiltz SCRI: Jennifer Lee David Marshall Cornell University: Terry Casstevens Pankaj Jaiswal Dave Matthews ACGT: Ayton Meintjes Jane Morris CIRAD: Manuel Ruiz Alexis Dereeper Matthieu Conte Brigitte Courtois Bioversity: Mathieu Rouard Tom Hazekamp Milko Skofic Raj Sood NIAS: Masaru Takeya Koji Doi Kouji Satoh Shoshi Kikuchi EMBRAPA: Marcos Costa Natalia Martins Georgios Pappas Guy Davenport Trushar Shah Kyle Braak Sebastian Ritter Yi Zhang Sergio Gregorio Joseph Hermocilla Michael Echavez Roque Almodiel Samart Wanchana Supat Thongjuea Theo van Hintum (WUR), GCP Subprogramme 4 Leader University of British Columbia: Mark Wilkinson GSC Bioinformatics Graduate Program, BC Cancer Agency: Benjamin Good James Wagner

Overview Generation Challenge Programme crop informatics research and development GCP platform architecture:  Domain model & ontology  Application development framework

Challenge Programme “I challenge the next generation to use new scientific tools and techniques to address the problems that plague the world’s poor” Dr. Norman Borlaug

An international research programme established in 2003, projected to last 10 years, and hosted by the CGIAR with global partners from ARI and NARES Research Themes Directed to Crop Improvement:  Genomics and comparative biology across species  Characterization of genetic diversity for allele mining  Gene transfer technologies Five research subprogrammes, one of which is crop information systems development. What is it?

Challenge Programme Cornell University USA Wageningen University Netherlands John Innes Centre UK NIAS Japan Agropolis France CIP Peru CIAT Clombia CIMMYT Mexico Bioversity Italy WARDA Cote d’Ivore IRRI Philippines ICRISAT India ICARDA Syrian Arab Rep. IITA Nigeria EMBRAPA Brazil BioTec Thailand ACGT South Africa ICAR India CAAS China

Genomic annotation, Forward and Reverse Genetics, Gene arrays/gels Candidate genes NILs, RILs Mapping pop. Mutants Beneficial alleles Linked to Traits Genebank Germplasm Genotyping & Phenotyping Value-added varieties Advanced breeding lines as vehicles Marker-aided Selection/ Transformation Process Genetic Resources Product SP2: Functional Assignment SP1: Allelic Mining SP3: Trait Synthesis GCP Research: from Genotype to Phenotype

Anatomical Developmental Field Performance Stress Response Genotype Germplasm Phenotype Molecular Expression Environmen t Integration across Diverse Crop Data Inventory Identification (passport) Genealogy Genetic Maps Physical Maps DNA Sequence Functional Annotation Molecular Variation (Natural or Induced) Location (GIS) Climate Day Length Ecosystem Agronomy Stresses Transcripteome Proteome Metabolome Physiology has determines affects

Crop Information Systems: the Next Large, globally distributed consortium Diverse research requiring a diversity of tools Large data sets with diverse data types Many legacy informatics systems and tools Global data integration required… Key Issue: Interoperability

Some Basic GCP Research Objectives Compile a list of germplasm meeting specific passport data criteria Compile a list of genetic markers of interest from genetic and QTL maps Retrieve genotypes of specified markers, for specified germplasm Align gene expression data against QTL positional evidence to identify candidate gene loci for specified traits

A Generalized GCP Crop Research Integration Work Flow Comparative Map & Trait Viewer (NCGR/ISYS) Genetic Map Data Source(s) Generation Challenge Programme Domain Model & Middleware Germplasm Passport/ Phenotype/ Genotype Querybuilder Comparative (Functional) Genomics Tools DIVA-GIS Germplasm Data Source(s) Genomics Data Source(s) GIS Data Source(s) Get/analyse a genetic map Find germplasm genotyped with mapped markers Get genotype & phenotype of germplasm Get candidate genes in map interval Get functional information about genes Plot germplasm, genotype and phenotype on geographical maps Analyse source environment of germplasm Select “interesting” candidate genes; get alleles Select adapted germplasm with favorable phenotype & alleles for further evaluation

An environment that provides improved access to data and analysis tools applications integrated databases and tools GCP Information Platform: User Perspective

GCP Information Platform – Developers’ Perspective application layer middleware internet TapirMOBY, etc. Data Registry local database layer

Generation CP Platform

GCP Platform - General Architecture “Model Driven Architecture” based on “platform independent” GCP scientific domain models, parameterized with controlled vocabulary (“ontology”) GCP domain models mapped onto platform specific implementations. Reference (Java) GCP platform application programming interface (API)

Semantics of the GCP Model Driven Architecture GCP is trying to model the meaning (“semantics”) of the crop research world. Semantics is found in the domain model at three distinct but interconnected levels:  System architectural level: general scientific semantics in terms of high-level object concepts (“object types”) and their global inter-relationships.  Entity level: attributes and behaviors internal to high-level object types.  Attribute level: attribute values of objects that range over data types: simple (e.g. identifiers, numbers), complex (other classes of entities) or ontology (such as Gene Ontology (GO) terms, for a gene product).

Germplasm Phenotype has an Attribute Value Observable with a has a ranges over Plant Ontology Layers of Semantics 1 Object Model of the Scientific Domain… 2 3 …Parameterized with Ontology

GCP Domain Model Specification High-level object types are specified with Unified Modeling Language (UML) and associated text narratives. Major object classes are represented in the object model. More specialized object types are specified by subclassing major object types using ontology. Reference model is coded by Eclipse Modeling Language managed with source code versioning and automatically compiled into other representations.

Scope of GCP Domain Model & Ontology Core models: generic concepts – identification, entities, features, organization, data management  Models heavily parameterized by ontology (e.g. entity and feature “type” attributes) Scientific models: extends core model into specific scientific scopes relevant to GCP:  Germplasm data (including genetic resources passport)  Genomics including genotypes, maps, sequences and functional annotation.  Phenotype data  Environmental data (including geographical location)

GCP Ontology Every attribute in the GCP domain model with data type SimpleOntologyTerm or subclass thereof, is an integration point for an external ontology. External public ontology (e.g. GO, PO, SO) reused when available, and new ontology developed within GCP to fill gaps. Ontology consolidated into GCP database based on GMOD Chado CV tables, indexed within platform using a GCP formatted identifier (that retains the source’s identifier).

GCP Domain Model Mappings onto Platform Specific Implementations GCP Platform Java Middleware & Applications OWL/RDF Ontology: VPIN/SSWAP.info SOAP Web Services (BioMOBY, SoapLab, GDPC) XML Schemata: GCP Data Templates, BioCASE/Tapir GCP Domain Model (UML/EMF) GCP Ontology Database

Reference GCP Platform API PantheonBase: a relatively simply core Java Application Programming Interface (API) for software integration:  DataSource: query data resources, using simple, ontology-driven SearchFilter specifications  DataTransformer: computational input/output  DataConsumer: communicate data to viewers

GCP DataSource Interface

DataSource Interface

GCP Data Source Implementations Direct Integration of relational databases (Spring HttpInvoker, Hibernate, JPA):  Developed for ICIS, GMOD Chado (beta) Protocols:  Generalized Java Client to connect to BioMoby web services; Java support for GCP-compliant BioMoby web service provider development (beta)  Support for BioCase/Tapir data source integration (prototyped)  GCP-compliant GDPC data source (prototyped)  SSWAP/VPIN wrapper (under discussion) Some other direct custom data source wrappers

Some GCP BioMOBY docs…

GCP BioMoby Support – a Synopsis 1.MoSES + Dashboard developed (M. Senger). 2.GCP model specific BioMoby datatypes specified. 3.Java libraries partly developed for interconversion of GCP BioMoby data types to/from GCP domain model Java objects (Barboza). 4.GCP DataSource Java implementation developed for client side of BioMoby that maps GCP DataSource find() use cases onto BioMoby web services using a using XML configuration files (no coding). 5.Java design pattern for modular implementation of BioMoby web services that get their data from any GCP-compliant DataSource that supports a given find() use case.

GCP BioMoby “Sandwich”

(Partial) Inventory of 3 rd Party Data Resources targeted for wrapping as GCP Data Sources Data TypeDescription Microarray DataMAXD database with microarray datasets from diverse GCP commissioned or competitive projects. Genetic and QTL Mapping Data QTL data available in ICIS, TropGenes. Genomic Diversity and Phenotype Connector (GDPC) connecting to Gramene, Panzea, GrainGenes et al. Genomic Sequence Data and Annotation NIAS KOME full length cDNA and RAP genome databases (?), connected to GCP web services by NIAS. OryzaSNP and GCP comparative genomic databases. Public sequence databases (via BioJava?) Functional GenomicsOryGenesDb mutant data (CIRAD); IR64 rice mutant database (IRRI); Tos17 database (NIAS). Germplasm Sample Characterization Data Germplasm, passport, genotype and associated field data available in ICIS databases; TropGenes, MGIS, ICRIS.

GCP Platform Implementations Standalone workbench (“GenoMedium”)  Eclipse Rich Client Platform (RCP) Web-based workbench (“Koios”)  AJAX, PHP, Java (server side), Java Web Start NCGR Integrated SYStem (ISYS) Direct tool integration (e.g. GCP MaxdLoad)

GCP Web-Based Search Engine GCP semantics defined query Summary of query hits List of items matched View details at 3 rd party web site or in locally invoked 3 rd party data viewer

(Partial) Inventory of 3 rd Party Analysis/Viewer Software being targeted for GCP Integration ToolPurpose SoapLab2Remote computational services access TavernaBioinformatics work flow management ApolloGenome sequence browser CytoscapeVisualization of networks ATVPhylogenetic tree visualization JalViewComparative sequence alignments TMEVMicroarray data analysis EASE, MapmanGene functional annotation CMTVComparative mapping and QTL MAXDLoad & MAXDViewMicroarray data management GDPC tools (Browser,Tassel)Genomic diversity analysis

GCP “Pantheon” Project in CropForge

Closing Perspective The GCP is a global consortium of 22++ crop research partners who need to share diverse large data sets and tools, in a globally distributed manner. Given the scope and duration of the GCP, developers within the consortium embraced the task of developing public global informatics standards for interoperability and integration. The effort is an open source, global community building exercise. We welcome the participation of any and all interested scientists and developers who might wish to use and/or contribute to the further evolution and application of these standards.