SSDS: Data for Science A Walkthrough of Proposed SSDS Capabilities 4 April 2002 John Graybeal.

Slides:

Advertisements

Similar presentations

Configuration management

Advertisements

Business Development Suit Presented by Thomas Mathews.

MICHAEL MARINO CSC 101 Whats New in Office Office Live Workspace 3 new things about Office Live Workspace are: Anywhere Access Store Microsoft.

A new Network Concept for transporting and storing digital video…………

Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)

Enterprise Content Management Departmental Solutions Enterprisewide Document/Content Management at half the cost of competitive systems ImageSite is:

Overview I-LINE2 is a browser based, train weight management software package, designed to provide an information interface for Weighline and Streamline.

Chapter 19: Network Management Business Data Communications, 4e.

1 The IIPC Web Curator Tool: Steve Knight The National Library of New Zealand Philip Beresford and Arun Persad The British Library An Open Source Solution.

Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.

Software to Manage EEP Vegetation Plot Data A design proposal Michael Lee January 31, 2011.

Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.

70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.

Maintaining and Updating Windows Server 2008

Operating Systems.

Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.

How to Get The Most Out of Outlook 2003 Michele Schwartzman Division of Customer Support Summer 2006.

Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.

Using Microsoft Outlook: Basics. Objectives Guided Tour of Outlook –Identification –Views Basics –Contacts –Folders –Web Access Q&A.

Understanding and Managing WebSphere V5

Managed by UT-Battelle for the Department of Energy Kay Kasemir ORNL/SNS April 2013 Control System Studio Training - Alarm System Use.

Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”

Welcome to the Minnesota SharePoint User Group. Introductions / Overview Project Tracking / Management / Collaboration via SharePoint Multiple Audiences.

Chapter 17: Watching Your System BAI617. Chapter Topics Working With Event Viewer Performance Monitor Resource Monitor.

Software Configuration Management (SCM)

11 Games and Content Session 4.1. Session Overview  Show how games are made up of program code and content  Find out about the content management system.

Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.

MBARI’s Shore Side Data System From Ships, ROVs, Moorings, AUVs, & ? To Bytes, Plots, Pictures, Samples, & Video.

LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.

System for Administration, Training, and Educational Resources for NASA SATERN Overview for Learners May 2006.

MBARI’s SSDS Data Management for Ocean Observatories Brian Schlining ブライアンシュリニング.

Chapter 6 Configuring Windows Server 2008 Printing

London April 2005 London April 2005 Creating Eyeblaster Ads The Rich Media Platform The Rich Media Platform Eyeblaster.

London April 2005 London April 2005 Creating Eyeblaster Ads The Rich Media Platform The Rich Media Platform Eyeblaster.

Module 7: Fundamentals of Administering Windows Server 2008.

Oceanographic Data Provenance Tracking with the Shore Side Data System Mike McCann, Kevin Gomes International Provenance and Annotation Workshop June 18,

Data Management Subsystem Jeff Valenti (STScI). DMS Context PRDS - Project Reference Database PPS - Proposal and Planning OSS - Operations Scripts FOS.

Learner and Manager Roles Module 2 1. SLMS Primary Administrator Training Learner Tasks 2.

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP Library Encryption - LTO4 Key.

1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.

Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Tom O’Reilly Monterey Bay Aquarium Research Institute.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

Microsoft Office Outlook 2013 Microsoft Office Outlook 2013 Courseware # 3252 Lesson 6: Organizing Information.

Week #3 Objectives Partition Disks in Windows® 7 Manage Disk Volumes Maintain Disks in Windows 7 Install and Configure Device Drivers.

Reports and Learning Resources Module 5 1. SLMS Primary Administrator Training Module 5: Reports and Learning Resources 2.

Label Design Tool Management Council F2F Washington, D.C. November 29-30, 2006

Kevin Gomes and John Graybeal, MBARI MBARI’s SSDS OOI Cyberinfrastructure: San Diego June 30, 2008.

Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:

Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.

Peter Chochula ALICE Offline Week, October 04,2005 External access to the ALICE DCS archives.

MOOS SSDS Data Access Features A Discussion with MBARI’s Science Data Users.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Reading Flash. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also choose some.

TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.

DSpace System Architecture 11 July 2002 DSpace System Architecture.

2007 TAX YEARERO TRAINING - MODULE 61 ERO (Transmitter) Training Module 6 Federal and State Installation and Updates.

Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.

Kevin Gomes, MBARI MBARI Data Architecture OOI Cyberinfrastructure: Data Product Generation Workshop San Diego May 20-21, 2008.

Provenance in Sensornet Republishing Unkyu Park and John Heidemann University of Southern California Information Science Institute June 18, 2008.

1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)

Using Workflow With Dataforms Tim Borntreger, Director of Client Services.

ETERE NUNZIO The ultimate end-to-end solution for your NewsRoom.

Visual Programming Borland Delphi. Developing Applications Borland Delphi is an object-oriented, visual programming environment to develop 32-bit applications.

Maintaining and Updating Windows Server 2008 Lesson 8.

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

Template Package  Presented by G.Nagaraju.  What is Template Package?  Why we use Template Package?  Where we use Template Package?  How we create.

Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.

PART1 Data collection methodology and NM paradigms 1.

Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.

Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.

Presentation transcript:

SSDS: Data for Science A Walkthrough of Proposed SSDS Capabilities 4 April 2002 John Graybeal

Topics What you want to hear: What data is in SSDS How to access data How to display data How to command instruments What else you should know: How easy to use is it? Are we sure the data’s OK? –Raw data always available? –Is it reliable? Is time right? What if there’s a problem? –Can we tell what happened? –Can we gracefully recover? Is data distributable/secure? What aren’t you getting?

What Data is Available? 1.All data produced by MOOS instruments Data is available ‘right away’ if sent to shore, or Data could be loaded later, directly from device 2.Other data which has been submitted to SSDS Submitted data must follow basic ISI/SSDS guidelines Can be brand new (e.g., calibrations), or derived (e.g., from other SSDS data) 3.“Metadata” (descriptive info) about the aboveMetadata  Notes –SSDS should not replicate external data stores –Someday could re-process existing MBARI data –Operational data can also be sent to SSDS and ingested

Metadata “Explained” Metadata is just “data about other data” –My metadata may be your science data, or vice- versa 4 metadata types MOOS will handle (≈static) –Packet headers (source, timestamp, sequence) –Packet descriptions (item 1=“Depth”, 2=“Lat”) –Device (data source) descriptions –Rich science metadata (status, calibration info) Everything else is ‘just data’ Wherever possible, we’ll try to keep it simple

How To Access the Data? Ask (catalog) for data of interest (search by device, date, data item name, or combination) Choose a data set (sets?) of interest, click to access –Probably multiple text formats—what’s important? (ASCII CSV? ODV? netCDF? other?) –Do you need to monitor or process ‘streaming’ data? What more advanced features are needed? Desired? –Displaying same item across multiple data sets? –Selecting specific items or times within data set? –Processed data products…Sub-setting or interpolating data by time or item? Averaging? Filtering? …? –Combining 2 data sets using time as reference?

How to Display the Data? Basic plots will be available via web interface –Quick look in the truest sense –We don’t want to create yet another plotting program Data will be available to existing tools –Minimum capability is usable files (ASCII, netCDF, ?) –Ideal is to embed SSDS data access directly into tools In this model, software within Matlab (for example) can open anything in the archive Browsing from within application would be a big plus Some (many) tools may do this for free; others we can ‘help’ Before discussing further, you should understand the way we want SSDS (and MOOS) to work

MOOS Data Architecture Devices Observing Platform Shore Side Data System User Applications (User Tools) Data Presentation Communications Archiving Applications/ Interfaces Data line 1 more data last data OceanSideShoreSide Cataloging

How to Access Instrument (by the way, it’s not an SSDS task) Devices Observing Platform Shore Side Data System User Applications (User Tools) Data Presentation Communications Archiving Applications/ Interfaces Data line 1 more data last data OceanSideShoreSide Cataloging

How Data Access Works Devices Observing Platform Shore Side Data System User Applications (User Tools) Data Presentation Communications Archiving Applications/ Interfaces Data line 1 more data last data OceanSideShoreSide Cataloging     

How Data Access Works 1.SSDS automatically notified of instrument information –Instrument qualification and installation on MOOS –Instrument configuration (default settings, changes) –Data record descriptions (syntactic and semantic) –Arrival of new data records 2.SSDS automatically catalogs, archives all arriving data 3.Users search catalog for data of interest –References to archived data returned with search results –Source data can be accessed via the references 4.User can then view (or subscribe to?) the source data –Various formats provided, including basic plots –Connections to advanced presentation packages supported

Topics What you want to hear: What data is in SSDS How to access data How to display data How to command instruments What else you should know: How easy to use is it? Are we sure the data’s OK? –Raw data always available? –Is it reliable? Is time right? What if there’s a problem? –Can we tell what happened? –Can we gracefully recover? Is data distributable/secure? What aren’t you getting?What aren’t you getting?

How easy to use is it? The Hard Part I: Providing ISI instrument drivers –Templates should be available, useful for most devices The Hard Part II: Describe your data streams –Must define instrument data streams before deploying –Even this can be easy (define your data as a “blob”; but…) Steps to get data should be pretty easy (1-step?) –Find it in catalog (may be many items with similar names) –Ask for it in your favorite basic format –Plug it in to your favorite application MOOS/ISI/SSDS makes many things simple –Timestamps: synchronous, reliable, available –Data transfer, archive, backup all handled automatically –Operational relationships (particularly location) tracked

Are we sure the data is OK? Raw data always available? –The system is designed around this core concept –Even if SSDS dies, raw data won’t go away Is data reliable (what you see is what was sent)? –Same software for ALL data communication and management -- excellent reliability, less work Is time base correct for the data? –Uniform time base for all MOOS/ISI components –Of course, you have to send data via ISI data paths If you keep it in the instrument, ISI can’t timestamp it

What if there’s a problem? Can we tell what happened (and avoid it)? –Certain systematic information will be available Other data arrivals from device/platform/observatory Indications of instrument events, reconfigurations –Operational data can be sent and maintained Transfer rates, connection reliability, power status Systemic events and errors Can we gracefully recover? Yes! (within reason) –All the transferred raw data is kept in SSDS –All the instrument’s raw data is saved on wet side –System designed for graceful data (re)processing

Is data distributable? Is data secure? Request: Give colleagues access to ‘my’ data –Model A: Everyone has access to all data (w/fuzz) –Model B: MBARI Internal vs MBARI External Option1: Make ‘your’ data available externally Option 2: Bring them to MBARI Option 3: Send them a report of your data –Model C: Configurable data access security Notionally follow Unix (self, group, other) model Note this model costs more (amount TBD) to implement (Note: Access security is also central to confidently enforcing proprietary periods.)

What aren’t you getting? Totally transparent way of doing business –Some accommodation to infrastructure is required Very low latencies in data streaming, archiving –Latency may be from sensor to shore, and from shore to archival interface –Total latency not to exceed 1 hour (?) Domain-specific data (re-)processing Advanced data merging and reprocessing Sophisticated data plotting/analysis via web interface (High-bandwidth, always-on access to device) A perfect, fully functional system on day 1

Data Mgt Architecture

Conclusions SSDS should improve data management for all users –At minimum, easier access to your own data and plots –Straightforward access to all MBARI data (references) –More reliable data storage, time references, metadata links –Better long-term usability (gives us more time) Development will be incremental –Full-featured release targeted for MSE 2003 –Prototypes will exist before then (soon!), but may evolve –Features will grow with third-party solutions Many questions about first-order science priorities –Which general-purpose functions do you really need? –What are most useful data formats? application interfaces? –How important is fine-grained access security?