An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre,

Slides:



Advertisements
Similar presentations
Challenges and Achievements Presented by Cameron Kiddle Research Fellow, Grid Research Centre, University of Calgary.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
An iRODS-based Distributed Data Management System for CyberSKA Cameron Kiddle, Arne Grimstrup, Russ Taylor – University of Calgary Venkat Mahadevan, Erik.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Supporting Data Management Infrastructure for the Humanities (Sudamih): Database as a Service (DaaS) : A Tool For Researchers James A J Wilson
Overview of Search Engines
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
SYNAT - the Polish National Research Content Infrastructure Wojtek Sylwestrzak, ICM Tomasz Rosiek, ICM Tomasz Krassowski, ICM Tartu, Estonia June 27, 2012.
Digital Library Architecture and Technology
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Web 2.0: Concepts and Applications 6 Linking Data.
Clinical Trials Program PhUSE Semantic Technology WG.
Communication & Web Presence David Eichmann, Heather Davis, Brian Finley & Jennifer Laskowski Background: Due to its inherently complex and interdisciplinary.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Social Networking and Scientific Gateways Roger Curry, Cameron Kiddle and Rob Simmonds Grid Research Centre University of Calgary.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
The Importance of Standards in Digital Preservation Tina Norris Kayla Payne Jennifer
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Collection Management Systems
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
A Universe of Data on your Desktop Russ Taylor, Bob Este, Cameron Kiddle University of Calgary CyberSKA.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Usage scenarios, User Interface & tools
Joseph JaJa, Mike Smorul, and Sangchul Song
VI-SEEM Data Repository
Ahmet Fatih Mustacoglu
Library Technology Conference: Building Exhibits
NSDL Data Repository (NDR)
Code Analysis, Repository and Modelling for e-Neuroscience
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Code Analysis, Repository and Modelling for e-Neuroscience
Robert Dattore and Steven Worley
Palestinian Central Bureau of Statistics
Presentation transcript:

An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre, University of Calgary 2 Centre for Earth Observation Science, University of Alberta

 Data Challenges  Related Work  Data Management System  Use Case: GeoChronos  Summary and Future Work Outline GCE 2010 Nov. 14,

 Data Acquisition Much scientific data stored on off-line media Cumbersome and time consuming to access Making data available on-line difficult Insufficient storage and bandwidth  Sharing of Data Lack of willingness to share data Proprietary data - need for controlled access Data Challenges - I GCE 2010 Nov. 14,

 Usability of Data Insufficient metadata to describe data Various metadata standards in some domains, but many lacking metadata standards – many scientists use their own metadata format  Finding Data Difficult to find data that you need Different data organized / stored differently Tools to browse, search, visualize data often lacking Data Challenges - II GCE 2010 Nov. 14,

 Content Management Systems i.e., Drupal, Joomla!, Microsoft SharePoint, Plone,... Offer rich set of features but do not handle:  Meaningful support to specific data formats  Efficient association of metadata and ancillary files to data sets  Access to a variety of data processing tools  Uniform handling of outputs from processing tools  Spectral Libraries i.e., USGS, ASTER, Vegetation Spectral Library (VSL) Are available on-line but lack:  ability to dynamically restructure metadata for browsing  collaboration features enabled by social networking Related Work - I GCE 2010 Nov. 14,

 Spectral Library Tools i.e., DLR-DFD Spectral Archive, SPECCHIO Flexibile in creating / handling metadata but:  Have a fixed metadata schema – do not support new metadata needs  Data repositories for other domains i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI) Databases Offer wide range of functionality but:  Primarily focus on data that is already validated and structured  Do not handle preliminary, intermediate, untested data (i.e. research in progress)  Digital Libraries i.e., Planetary Data Systems, NCore, SciPort Have flexible functionality but:  Most focus on well-defined digital artefacts  Limited in handling collaboration on evolving data, metadata and schemas Related Work - II GCE 2010 Nov. 14,

 Supports the following functionality: On-line access to data Enables scientists to share data while maintaining control of who sees it Ability to add and edit metadata while working with multiple schemas Collaboratively create new schemas to facilitate consistent/accurate recording of metadata Dynamically restructure the way data is browsed Data Management System - Overview GCE 2010 Nov. 14,

Data Management System - Framework GCE 2010 Nov. 14,  User & Data: User acquires data from sensor and uploads to portal Direct acquisition of data also possible  Elgg Portal: Built on top of Elgg – Open source social networking platform Fine grained access control Flexible data model  Data Storage: Currently local NFS storage Working on distributed iRODS based system  Data Ingestion Service: Creates records, parses metadata, establishes ancillary relationships Deployed on cloud-based Condor pool

Data Management System – Data Model GCE 2010 Nov. 14, Source: Data Management System – Data Model  Arbitrary metadata can be assigned to any entity  Annotations allow users to comment on entities not owned by them  Data management system adds three new types of ElggObjects  Schema  Collection  Record

Data Management System - Schemas GCE 2010 Nov. 14,  Create schemas Custom or standards-based (i.e. Dublin Core) Individually or as a collaborative team  Schemas consist of Namespace Description Read/write access permissions Series of metadata keys  Metadata keys consist of Name Description Type (text, latlong, ancillary) Optionality: required, recommended, optional

Data Management System - Collections  Group of related data i.e., spectral library, set of satellite data  Collection consists of Name, description, read/write access permissions, metadata, records GCE 2010 Nov. 14,

Data Management System - Records GCE 2010 Nov. 14,  Atomic unit of data management system Usually represents a single file, but does not need to be associated with a file  Tabbed interface for viewing: Spectral plot, metadata, ancillary data, map, comments Custom tabs based on data type

Data Management System – Virtual Directory Structure GCE 2010 Nov. 14,  Dynamic restructuring of data for browsing purposes  Folders based on metadata keys/values  User can customize the metadata keys used to establish the directory hierarchy

Use Case - GeoChronos GCE 2010 Nov. 14, (

 An on-line platform For:  Earth Observation Scientists Facilitating:  Collaboration between scientists  Data access, management and sharing  Application access, management and sharing Leveraging:  Web 2.0 and social networking technologies  Cloud computing technologies Funded by:  CANARIE - Network Enabled Platform (NEP-1) program  Cybera GeoChronos - Overview GCE 2010 Nov. 14,

GeoChronos - Project Team GCE 2010 Nov. 14, Dr. Arturo Sanchez-Azofeifa University of Alberta Dr. John Gamon University of Alberta Dr. Benoit Rivard University of Alberta Dr. Rob Simmonds University of Calgary Prinicipal Investigators Project CoordinationPlatform DevelopmentDomain Scientists

GeoChronos - Virtual Organization GCE 2010 Nov. 14,

 Libraries created Ingested some existing on-line libraries  USGS, ASTER, Vegetation Spectral Library (VSL)  Many enhanced features as part of GeoChronos Spectral Library module - improved browsing, dynamic plotting, mapping, annotations,... Domain scientists have contributed libraries  Rock samples, tar sand samples, lichen samples, vegetation samples, alfalfa/barley field samples  Data formats / parsers supported ENVI, UNISPEC, ASD, several ASCII formats  Schemas incorporated Library specific – USGS, ASTER, VSL,... Sensor/Format specific – UNISPEC, ENVI,.. Other Standards – Dublin Core  Currently hosting (including MODIS data) 10+ schemas, 20+ collections (libraries), 20,000+ records GeoChronos – Spectral Libraries GCE 2010 Nov. 14,

GeoChronos – MODIS Satellite Data  Developed automated workflow service for mosaicing, subsetting, reprojecting and masking MODIS satellite data  Significantly reduces time that scientists have spent manually doing such workflows  Data management system used to store raw MODIS satellite data and data products derived from the workflow  Parsers/schemas specific to MODIS data have been added to system  User provided with same powerful interface as Spectral Libraries for browsing, accessing and viewing data GCE 2010 Nov. 14,

 Have developed data management system in an interactive, iterative fashion  Domain scientists on project have provided much guidance, testing and feedback  Have customized, enhanced the data management system based on feedback received GeoChronos – User Feedback GCE 2010 Nov. 14,

 Identified data related challenges facing scientists  Discussed some related efforts and shortcomings of these approaches  Presented an on-line collaborative data management system addressing many data challenges  Showed example usage of the data management system by GeoChronos Summary GCE 2010 Nov. 14,

 Currently have a single local data repository Working on extending data management system to work with distributed data repositories using iRODS  Currently have powerful browsing functionality Need to add search functionality across collections and based on metadata values  Currently support custom metadata schemas Plan to make use of Semantic Web technologies to better relate data and provide ontological mapping between different metadata schemas / standards  Currently work with spectral and MODIS satellite data Plan to incorporate other data such as carbon flux data, other satellite data, meteorological data, phenology tower data Next Steps GCE 2010 Nov. 14,

Contact Information GCE 2010 Nov. 14,