Data Catalog Project A Browsable, Searchable, Metadata System

Slides:



Advertisements
Similar presentations
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Advertisements

CATCHPlus Valorisation project for CATCH research programme. –Public funding –But: development mainly by commercial parties –Open source required Cultural.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Metadata for Digital Content Jane Mandelbaum, Ann Della Porta, Rebecca Guenther.
Metadata Presentation by Rick Pitchford Chief Engineer, School of Communication COM 633, Content Analysis Methods Fall 2009.
Methodology Conceptual Database Design
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Context and Prosopography: Putting the 'Archives' Into LOD-LAM Corey A Harper SAA MDOR
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Web Services for Earth Science Data Edward Armstrong, Thomas Huang, Charles Thompson, Nga Quach, Richard Kim, Zhangfan Xing Winter ESIP 2014 Washington.
ISBD for the Semantic Web: namespaces, elements, vocabularies, application profile Gordon Dunsire Presented at Centar zu Stalno Stručno Usavršavanje (CSSU),
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Information Systems & Databases 2.2) Organisation methods.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Copenhagen, 7 June 2006 Toolkit update and maintenance Anton Cupcea Finsiel Romania.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
+ Information Systems and Databases 2.2 Organisation.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
New COOL Tag Browser Release 10 Giorgi BATIASHVILI Georgian Engineering Center 23/10/2012
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
IESR, A Registry of Collections and Services: Using the DCMI Collection Description Profile in Practice Ann Apps MIMAS, The University of Manchester, UK.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
State the Medical Research Topic Give a short description of the purpose of the research topic and what the teaching materials will cover.
FIND IT! USING LIBRARY CATALOGING CONCEPTS TO ORGANIZE AND MAKE RECORDS FINDABLE DIONNE L. MACK, INTERIM DIRECTOR OF QUALITY OF LIFE DEPARTMENTS.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Platform as a Service (PaaS)
Chapter (2) Database Systems Concepts and Architecture Objectives
Databases and Database Users
Databases and Database Users
Platform as a Service (PaaS)
Methodology Conceptual Databases Design
Users and Administrators
Methodology Conceptual Database Design
Software Specification Tools
An Overview of Data-PASS Shared Catalog
Computer Aided Software Engineering (CASE)
VI-SEEM Data Discovery Service
Adobe Lightroom Library Module
Middleware independent Information Service
POOL persistency framework for LHC
An Architecture for Complex Objects and their Relationships
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 27 WWW and HTTP.
Intermountain West Data Warehouse
IFIP16/ICEUT2000 Integrated Visualization-based Environment for Computer Science Education Kimio Sugita, Youzou Miyadera Kensei Tsuchida, Takeo Yaku I.
Software Requirements Specification Document
Metadata Framework as the basis for Metadata-driven Architecture
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Disseminating Service Registry Records
Module P4 Identify Data Products and Views So Their Requirements and Attributes Can Be Controlled Learning Objectives: Understand the value of data. Understand.
Methodology Conceptual Databases Design
Results Fusion in Heterogeneous Information Sources
Database System Concepts and Architecture
Microsoft Access Date.
APE EAD3 introduction - DARIAH - Brussels
Users and Administrators
New Perspectives on XML
Presentation transcript:

Data Catalog Project A Browsable, Searchable, Metadata System J. Stillerman, T. Fredian, M. Greenwald, G. Manduchi

The Problem Modern experiments record very large sets of heterogeneous measurements. Measurements are hard to find, understand and access for a variety of consumers: local users other than the primary producers new users visitors remote users users analyzing data from more than one experiment

Local Users / New Users / Visitors Modern experiments are so complicated that even experienced local users have trouble navigating the stored data. New users need to be able to find and understand the available measurements from an experiment. what measurements are available? what do they mean ? - how are they defined? who do I talk with about them? How do I access them ? How do I display them ? Visitors are have these questions / need these answers, even more! Off-site collaborators can not even ‘just ask someone’

The Solution Create a data catalog and corresponding data store. Associate a standard set of metadata with each measurement. Use an ontology to make metadata names unambiguous, and easily searchable Associate usage/type tags with measurements. Also based on an ontology Associate additional metadata germain to these tags Store these metadata in a relational database for browsing and searching Include data access URIs in the catalog, specifying where and how to access the data referred to.

Use Restricted Vocabularies Ontologies or ‘dictionaries’ define the namespaces for the metadata in the system. General metadata Usage tag names usage specific metadata These dictionaries provide the terms used to build user interfaces for searching and browsing. Fixed vocabularies reduce user confusion of similar terms.

Catalog Entries Coupled to Datastore The data catalog allows users to locate and understand recorded data from experiments. The URI details how to access the data described. URI == Uniform Resource Identifier == a character string that specifies how to retrieve the data Loose coupling between the data catalog and the data store. Examples: mdsplus://server/tree/shot/path-in-shot hdfstore://myfile.h5::/data/path * *from http://odo.readthedocs.org/en/latest/hdf5.html

Measurement details Catalog entries describe collections of recorded data. These collections can be heterogeneous. Measurements are composites of one or more traces. Who What When Meta data tags: Name, Owners, Description, URL, Short Label, Long Label, Units, Geometry, View(s),... Usage tags: Time-Series, Profile, Image, Image-Sequence,...

Trace Details Traces have a similar list of metadata but refer to specific retrievable data. A URI specifies where and how to access the data. These URIs could refer to any persistent data store. They will likely contain references to MDSplus branches. The MDSplus branches could all be stored in a subtree constructed for the purpose.

Homogeneous MDSplus Branches Regularity of the metadata in the database facilitates searching, browsing, and understanding. Regularity of the metadata in the MDSplus data store facilitates data driven application development. Each trace referred to by the database, will be represented in MDSplus as a node with associated metadata in the tree. Mechanism to refer to required and optional metadata in the database. Mechanism to refer to required and optional tag specific metadata in the database. Metadata could be referenced as nodes under referred to URI or some other mechanism. Could be stored as values or expressions. As long as it is accessible as ‘properties’ of the trace node. This same scheme can be (should be) used throughout the experiment.

Database and MDSplus Locations Sites can host the database and corresponding MDSplus trees locally so that visitors and collaborators can search, browse and view their measurements. For consumers of data from sites not using MDSplus, the database and an MDSplus tree can be hosted on the consumer’s site. These ‘local’ trees can contain references to network APIs for the remote experiments. These ‘local’ trees can contain data extracted from the remote experiment’s data store. These ‘local’ trees can be augmented with locally produced results

Applications The combination of the catalog (database) and a homogeneous data representation (MDSplus) makes it very easy to make data driven, high level display applications. The database can drive signal/image selection The data store (referenced by URIs in the database) provide the data to display.

MIT’s VIDEO_DISPLAY

General Atomic’s ReviewPlus

πScope πScope main screen is a development environment equipped with Shell, Editor, Data Browser and Debugger) Screenshot of MDSplus data scope

Implementation This project is currently in a design phase. The database will likely be done using DJANGO’s ORM which abstracts the underlying database as a set of python classes. The GUI for searching and browsing will likely be implemented using DJANGO. There will be a restful API to the database implemented using DJANGO-Rest-Framework.

Related Project The Metadata Provenance Ontology Project (MPO) [See presentation Abla] has as one of its key constituents ‘data objects’. These data objects are references to persistent data. Files, Database Records, Records in Files, or MDSplus. Eventually the URI’s that refer to results from experiments should be data catalog entries.

Conclusions A browsable, searchable data catalog is being created. It will provide an index for homogeneous branches stored in MDSplus. New users and visitors will be able to use this to find, access and understand the recorded measurements from an experiment. This also applies to local users of diagnostics. This tool can be used to homogenize the APIs from remote experiments for users working as collaborators.

Thank you... Many of the ideas for this project come from: Questions ? Matt Reinke (University of York) Jet JPF/PPF system The ITER data archive assessment (2011), provided by (Tessella, Nakanshi) under contract for ITER. Local users at Alcator C-Mod Questions ?