An Ontology-centric Architecture for Extensible Scientic Data Management Systems Gavin Kennedy1,2 Dr Yuan-Fang Li3 2: School of ITEE, University of Queensland, St Lucia, QLD 3: Clayton School of IT, Monash University, Clayton, VIC Gavin.kennedy@csiro.au Novel High Resolution tools at the HRPPC Dr Xavier Sirault1 Dr Bob Furbank1 1: CSIRO Plant Industry, Black Mountain Cnr Clunies Ross St & Barry Drive Canberra, ACT 2601 Xavier.sirault@csiro.au
What is Plant Phenomics? Phenome = Genome X Environment Genomics is accelerating gene discovery but how do we capitalise on these data sets to establish gene function and development of new genotypes for agriculture? High throughput and high resolution analysis capacity now the factor limiting discovery of new traits and varieties “ In the next 50 years we must produce more food than we have consumed in the history of mankind” Megan Clarke, CSIRO CEO 2009
Phenomics from the Leaf to the Field Imagine a plant breeder walking his trials logging plant performance distributed sensors with his mobile phone or logging on to Phenonet from home to view his wheat in real time
HRPPC: Canberra node of the Australian Plant Phenomics Facility Role Deep phenotyping Development of next generation tools to probe plant function and performance (come and see us) Brachypodium distachyon Arabidopsis thaliana Infrastructure: 1500 m2 lab space 245 m2 greenhouse 260 m2 growth cabinets Analytical tools packaged in: 1- Model Plant Module (HTP) 2- Crop-Plant Shoot Module (MTP) 3- Crop-Plant Root Module (MTP) 4- Crop-Plant Field Module (HTP) Gossypium species Triticum and Hordeum species, Vigna unguiculata (cowpea), Cicer arietinum (chickpea), Zea mays (maize), Sorghum bicolor, … 4
Capitalising on new imaging technologies Plant Morphology Plant Function Visible imaging Plant area, biomass, structure Senescence, relative chlorophyll content, pathogenic lesions Far Infrared imaging Canopy / leaf temperature Water use / salt tolerance Chlorophyll Fluorescence imaging Physiological state of photosynthetic machinery Near IR imaging Tissue water content Soil water content FTIR Imaging Spectroscopy / Hyperspectral imaging Cellular localisation of metabolites (sugars, protein, aromatics) Carbohydrates, pigments and proteins 5
Addressing issues with fluorescence and environmental control PlantScan: next generation phenotyping platform for n-dimensional Models Light Detection and Ranging (LiDAR) Micro-bolometer sensors (Far-Infrared) 4-CCD line scanner (NIR and visible split) Addressing issues with fluorescence and environmental control
Automated features extraction and quantification of n-dimensional models Jurgen Fripp CSIRO ICT E-Health Brisbane Automated segmentation – extracted stem Bounding box extraction and Delauney triangulation for convex 3D hull Volume over time Height and total volume extraction Sirault, Fripp and Furbank (in preparation)
An integrated phenotyping platform for Model Plants PAM Fluorescence imaging Far Infrared imaging Visible imaging for growth Climate controlled in equilibration chamber and imaging chambers 2500 plants per day Applications: 1001 genomes project - 65 re-sequenced Arabidopsis thaliana ecotypes under analysis - with Detlef Weigel USDA Brachypodium distachyon project
www.phenonet.com Distributed Sensor Network for Phenomics Measure and log range of environmental factors on field trials. Zigby wireless transmitters: Thermopile Temp Sensor Humidity Ambient Temp Soil Moisture Imaging: Estimate biomass; greeness index for fertilization; detect flowering; estimate yield. Imaging constrained: Develop smarter portable platforms.
Ontologies Ontologies are a set of formalised terms that allow us to represent knowledge about concepts and relationships in a domain. Annotating with ontologies means describing a domain object or process. Modelling with ontologies means classifying a domain object or process, and its relationship to other domain concepts. This image shows the wheat plant on the left has increased “salt tolerance (TO:0006001)” OBI:0000050 : “platform” “A platform is an object_aggregate that is the set of instruments and software needed to perform a process. “
Ontologies Evolutionary Changes in Domain, Model & Data Expressed in OWL (& RDF Schema) Provides syntax & semantics - enables reasoning Expressivity vs decidability Validation via reasoning Designed to be open & interoperable Facilitates sharing, reuse & Integration Maturing technology stacks APIs, reasoners, triple stores, query engines
PODD The Phenomics Ontology Driven Data repository PlantScan The Phenomics Ontology Driven Data repository A research data and metadata repository. Managing Phenomics Data from Multiple Heterogeneous High Volume High Resolution Data Generation Platforms A methodology for managing and publishing research data outputs. A semantic web data resource. Phenonet Data Phenomobile TrayScan Metadata PODD Metadata Repository PODD Data Stores Data Metadata
Putting the OD in PODD Basics: Ontologies as domain models for research data Model domain objects as ontological objects Base ontology: domain independent Phenomics ontology: domain specific Organizes data logically Represented as metadata objects Parent-child relationship Referential relationship Drives all operations in the data lifecycle Domain Concepts OWL Classes Attributes and relations OWL Predicates Domain Objects OWL Individuals Comments, descriptions OWL Annotations
Observation/Phenotype The PODD Ontology Project Project Plan Investigation Platform Analysis Event Genotype Treatment Material Material Container Data Environment Design Gene Sex Observation/Phenotype Treatment Archive Data Sequence Measurement Measurement Parameter
PODD Architecture Objects represented semantically Semantics (metadata) captured in RDF Repository operations on RDF: Ingestion, retrieval, update, query & search, export Backend Object Management: Fedora Commons Fedora objects mapped to Java objects for: Business Logic Layer Interface Layer
Future Work Annotation Services Ontological tagging of PODD objects Annotation tools, search/discovery tools, browsers, etc. Virtual Laboratory Environment Support Phenome to Genome (and back) discovery processes Analyse linkages across data resources Workflows for statistical inferences & mathematical modelling. Visualisation tools etc...
Resources Plant Phenomics Test Instance: http://poddtest.plantphenomics.org.au/ Plant Phenomics Production Instance: http://podd.plantphenomics.org.au/ Mouse Phenomics Production Instance: http://podd.australianphenomics.org.au PODD Project Website: http://projects.arcs.org.au/trac/podd Contact: Gavin.Kennedy@csiro.au Ph: +61413 337 819 This work is part of a National eResearch Architecture Taskforce (NeAT) project, supported by the Australian National Data Service (ANDS) through the Education Investment Fund (EIF) Super Science Initiative, and the Australian Research Collaboration Service (ARCS) through the National Collaborative Research Infrastructure Strategy Program.
The Team PODD Project Manager Gavin Kennedy University of Queensland eResearch Lab: Faith Davies (Developer) Simon McNaughton (Developer) Jane Hunter (eResearch Lab Leader) APPF/HRPCC/CSIRO Xavier Sirault (Science Leader, HRPPC) Xueqin Wang (Tester, Documentor) Bob Furbank (APPF HRPPC Leader) APPF/Plant Accelerator/Uni of Adelaide Bogdan Masznicz (Bioinformatician) Mark Tester (APPF TPA Leader) APN Philip Wu (Developer) Martin Hamilton (Developer) Adrienne McKenzie (APN Head of Network Services) Monash Univesity Yuan-Fang Li (Designer) NeAT Andrew Treloar (Deputy Director ANDS) Paul Coddington (Projects Manager, ARCS) ALA Donald Hobern (Director, ALA)