An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,

Slides:



Advertisements
Similar presentations
How to Author Teaching Files Draft Medical Imaging Resource Center.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Capturing and Supporting Contexts for Scientific Data Sharing via the Biological Sciences Collaboratory George Chin Jr. and Carina S. Lansing (PNNL) Appeared.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Overview of Search Engines
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab,
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
XML DTDs and other Alternatives: Vocabulary Markup Language (Voc-ML) Project & Friends Joseph A. Busch Director, Solutions Architecture NetLab and Friends.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
1 Open Ontology Repository: Architecture and Interfaces Ken Baclawski Northeastern University 1.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
R. Suresh (NASA/MTECH) Ben Burford (JAXA) Bernhard Buckl (DLR) Contact: - CEOS WGISS Meeting, Beijing, China, September 2004 A RSS.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Digital Library Syllabus Uploader Will Cameron CSC 8530 Fall 2006 Presentation 1.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Award Number IUG 2004 Boston, MA Integrating Digital Libraries and Traditional Libraries Sue Cody Arlene Hanerfeld Dan Pfohl University of North.
Manufacturing Systems Integration Division Development Process and Testing Tools for Content Standards Simon Frechette National Institute of Standards.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
ESMF and the future of end-to-end modeling Sylvia Murphy National Center for Atmospheric Research
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
Enhancements to Galaxy for delivering on NIH Commons
Andrea de Bono UNEP/GRID-Geneva.
Cloud based linked data platform for Structural Engineering Experiment
Chair of Tech Committee, BetterGrids.org
The Re3gistry software and the INSPIRE Registry
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Capturing and Organizing Scientific Annotations
Session 2: Metadata and Catalogues
Database Design Hacettepe University
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
SDMX IT Tools SDMX Registry
Presentation transcript:

An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra, James Z. Wang Presentation by Paulo Shakarian

Outline Problem Overall Goal Contributions Metadata Implementation Future Work Comparison to SIBDATA Concept

Problem Researchers often reference experimental results of their predecessors However, the raw data of experimental results is often not readily available. – Hence, results often cannot easily be re-used or combined with other experiments

Problem (cont.) Large repositories (i.e. NASA, NOAA, etc.) do collect experimental data – Often conform to global schema (which may cause some data to be lost) – Or stored as flat-files (requiring custom-built query applications) Also, data labels in experiments may differ (i.e. Temp. vs. Temperature vs. Celsius)

Overall Goal Architecture for dissemination, sharing, querying, and searching of scientific data on the WWW Schema not known a-priori Approach relies on sufficient meta-data of two varieties: – Data about the experiment (conditions, source, when uploaded, etc.) – Semantics for columns/rows in experimental results (what they represent, what units, etc.)

Overall Goal (cont.) Two-part approach: – Annotation application for semi- automatic creation of annotations – Web-portal for searchable storage of annotated scientific data.

Contributions of the Paper Propose architecture for semantically capable collaborative infrastructure for data collection and sharing System that utilizes two-level metadata scheme for document description and dataset attributes Description of current implementation

Dataset Metadata Dublin Core ( is a set of 15 elements for minimal resource description to ensure minimal operability – OAI-PMH – IETF RFC 5013 IETF RFC 5013 – ANSI/NISO Standard Z ANSI/NISO Standard Z – ISO Standard 15836:2009 ISO Standard 15836:2009 Attributes listed on next 3 slides

Dataset Metadata Paper states “uses Dublin Core 15 elements” but actually uses the following 15: – Title – Creator – Subject – Description – Contributor – Publisher – Date – Type – Format – Identifier – Source – Relation – References – Is referenced by – Language – Rights – Coverage.

Attribute Metadata Challenges: – Same attribute, different row/column name – (i.e. Temp vs Temperature – Same row/column name, but different attribute (i.e. Temperature (in deg C) vs Temperature (in deg K) – Row/column names may be ambiguous (i.e. Rate)

Attribute Metadata Metadata tags for attributes (right) Note they allow for dynamic generation of a dynamic collaboration ontology – Equivalent To – Different From – Superset Of – Subset Of – Type Of

Submitting a Dataset Uses a ``pull’’ technique – Author submits URL – System pulls annotated data Pull method allows the following – A moderator can check the URL from non-authorized submitters – Automatic tagging of provenance information for authorized users based on URL – Better protection from DOS attacks Banning of malicious users Implement a round-robin policy for fetching

Implementation: Metadata Used for chemical kinetics experiments Experimental results in MS Excel Metadata added through a MS Excel add-in

Implementation: Web Portal Three components – Web portal front-end – Data downloader and parser – Data analysis toolkit

Implementation: Web Portal Web Portal Front-End – Content management system – Dataset viewer – Data submission system Uses Mambo Server (open source, PHP-based) content-management system Data submission system deployed using JSP on ApacheTomcat 5

Implementation: Web Portal Data downloader and parser – Scheduler – Downloader – Parser Parser – Creates metadata as XML files – Data in Excel files imported into MySQL database – Parser creates a dataset index, linking dataset with dataset metadata and attribute metadata with data tables

Implementation: Data Analysis Tools In addition to supporting queries, plotting and regression tools included in web portal

Future Work Develop algorithms to derive dynamic collaboration ontology's Integrating query re-wrting and semantic searching using attribute-level semantics Automatic metadata generation using a user’s previous experiments Group, trust, privacy mechanisms

Comparison to SIBDATA Concept Relies on central repository (as opposed to multiple repositories for SIBDATA) Only useful for Excel-formatted experimental results Annotations may be an interesting feature to include in a SIBDATA or CDATA.

Questions