Data Management Components for a Research Data Archive

Slides:



Advertisements
Similar presentations
ASIAES Project Overview Satellite Image Network for Natural Hazard Management in ASEAN+3 region Pakorn Apaphant Geo-Informatics and Space Technology Development.
Advertisements

Prof. Natalia Kussul, PhD. Andrey Shelestov, Lobunets A., Korbakov M., Kravchenko A.
Data management in SCD Steven Worley General Categories –The Mass Storage System –NCAR user file services (home directories) –Computer attached storage.
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Scientific Investigations; Support from Research Data Archives for Joint Office for Science Support 26 February, 2002 Steven Worley SCD/DSS.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
Improved Access to RDA from the MSS OSD Executive Meeting April 28, 2009.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Evolving toward a Coherent, Collaborative Framework for Earth Science Data, Tools and Services Christopher Lynnes, Kwo-Sen Kuo and Kevin Murphy Earth Science.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
1-2-3 February 2006 –Page 1 Mersea Integrated System How to improve Access/Downloading services ? How far do we go in terms of standardization ?
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
SCD User Briefing The Community Data Portal and the Earth System Grid Don Middleton with presentation material developed by Luca Cinquini, Mary Haley,
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
Introduction What purpose does a data archive center serve if users can’t find or access the holdings they might need to facilitate their research discoveries?
GCI Architecture GEOSS Information System Meeting 20 September 2013, ESA/ESRIN (Frascati, Italy) M.Albani (ESA), D.Nebert (USGS/FGDC), S.Nativi (CNR)
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Enhancements to Galaxy for delivering on NIH Commons
Compute and Storage For the Farm at Jlab
Architecture Review 10/11/2004
Simulation Production System
Web Site Development and Macromedia Dreamweaver 8
Netscape Application Server
Introduction to Data Management in EGI
Grid Portal Services IeSE (the Integrated e-Science Environment)
University of Technology
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
An Introduction to Computer Networking
DIGITAL LIBRARY.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
BlackBoard 5 A Definitive e-Learning Software Platform Ozgur Balsoy,
Research Data Archives at NCAR
Wide Area Workload Management Work Package DATAGRID project
Steven Worley, NSF/NCAR/SCD
Metadata The metadata contains
Google Sky.
CISL’s Research Data Archive (RDA) : Description and Methods
Digitization Standards: Issues & Updates
Comeaux and Worley, NSF/NCAR/SCD
Long-Lived Data Collections
Robert Dattore and Steven Worley
Successful Data Curation for Large Data Archives
Comeaux and Worley, NSF/NCAR/SCD
Presentation transcript:

Data Management Components for a Research Data Archive Steven Worley and Bob Dattore Scientific Computing Division Computational and Information Systems Laboratory NCAR 5/18/2019

Research Data Archive (RDA) definition Components Outline Research Data Archive (RDA) definition Components MSS Online Data Server - traditional service Databases Community Data Portal - evolving service SAN Media for I/O 5/18/2019

Research Data Archive (RDA) definition Collection of reference datasets used in atmospheric and related sciences Over 600 datasets 10-20 new datasets added annually First established about 40 years ago Basic metrics 548K files 100.5 TB 2-3K unique users annually 5/18/2019

What makes a dataset? Elements of a dataset Data files (1 ~ 20K) Syntactic and semantic metadata Publications Documentation Lineage Data preparation, QC, analysis methods, etc 5/18/2019

Component Schematic Diagram for RDA RDA Database RDA Server MSS SAN CDP Data I/O 5/18/2019

MSS Features Archive for all data Local users can access all data Including backup files Local users can access all data Local = anyone with SCD computing account Only need file name! Usage logs are generated When, what, who accessed the data 5/18/2019

Component Schematic Diagram for RDA RDA Database RDA Server MSS SAN CDP Data I/O 5/18/2019

Online Data Server - Traditional Features Exclusive dedication to the RDA Single point for all information Project web pages and catalogues Home web page for each dataset General Description MSS File Lists Search/Discovery Software Documentation Consultant contact Most readily needed data, (~ 15TB) FTP and Web access User request forms for one-off data requests 5/18/2019

Online Data Server - Traditional 5/18/2019

Component Schematic Diagram for RDA RDA Database RDA Server MSS SAN CDP Data I/O 5/18/2019

RDA Database RDA management tool Metadata server 5/18/2019

RDA Database (Management tool) Current Capabilities Future Capabilities DATA SOURCES SCD computer user account data MSS and Data Server file descriptions MSS and Data Server file usage logs RDA dataset - file relationships DATA SOURCES Expanded RDA metadata for datasets Syntactic metadata for files Individual data order request information Research Data Archive Database (RDADB) APPLICATIONS / SERVICES MSS and Data Server usage reports By time, dataset, user, file, … From command or web view MSS RDA file integrity audit: Dataset, password, retention, ... APPLICATIONS / SERVICES MSS filename assignment and dataset registration Data order request processing 5/18/2019

RDA Database (Metadata Server) Future Capabilities Research Data Archive Database (RDADB) USER UTILITIES File selection from search criteria Semantic and syntactic metadata Provide pointers to data location MSS, Data Server and CDP Provide pointers to documentation and software Support MSS file access Pre-form MSS access commands Account for blocking, compression, etc Receive and initiate data requests to DSS staff 5/18/2019

5/18/2019

RDA Server and MSS Example for One Dataset Note: 95 Unique users, total 33K files delivered 20.5 TB accessed 5/18/2019

Component Schematic Diagram for RDA RDA Database RDA Server MSS SAN CDP Data I/O 5/18/2019

Community Data Portal (CDP) Features Organization-wide facility RDA plus many other groups Standard metadata - minimum requirement CF and GCMD keyword compliant <XML> format Build catalogues Other optional elements Data files, images, movie clips, documentation, model codes, etc…. 5/18/2019

Community Data Portal (CDP) Objectives Dissolve cooperate structure from user view and facilitate one stop data discovery Enable: Client/server network data access OPENDAP, GDS, LAS interactive access Scientific collaborations between remote groups Easy to use environment A robust system that serves many Eliminate the need for individual groups similar systems 5/18/2019

Community Data Portal (CDP) Earth System Grid, a CDP subsystem Features Multi-organization (NCAR, DOE, LLNL) shared resources Now, data access only. Future to include computing. Very tight security High level authorization and authentication Advanced software, Globus Toolkit, GridFTP, etc Successful for current AR4 IPCC assessment U.S. contribution to global climate evaluation 5/18/2019

Component Schematic Diagram for RDA RDA Database RDA Server MSS SAN CDP Data I/O 5/18/2019

SAN Features New and growing area 32 TB ATA disk with ADIC software Current connections - two data servers RDA and CDP (same architecture, SUN) Future More ATA storage - target to 60-120TB Heterogeneous servers, e.g. LINUX cluster, SGI, etc 5/18/2019

Component Schematic Diagram for RDA RDA Database RDA Server MSS SAN CDP Data I/O 5/18/2019

Data I/O Objectives Network transfers I/O Media I/O - still important I, Build archive content O, Deliver data outside NCAR Network transfers I/O used most often Media I/O - still important Tapes: LTO, DLT, DAT, Exabyte Disks: CD-ROM, DVD Devices: USB mountable drives For data rescue from outside sources Still have 9 and 7 track tape drives 5/18/2019

Operational Schematic Diagram for RDA RDA Database Integrity Monitor RDA Server Some data Metadata MSS All RDA data SAN Top Collections CDP Metadata Data Data I/O Network Media 5/18/2019 Metadata Data

Many system component are necessary to manage a RDA Components Conclusion Many system component are necessary to manage a RDA Components MSS Online Data Server - traditional service Databases Community Data Portal - evolving service SAN Media for Data I/O 5/18/2019