RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Slides:



Advertisements
Similar presentations
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Advertisements

Data management in SCD Steven Worley General Categories –The Mass Storage System –NCAR user file services (home directories) –Computer attached storage.
ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
“ Does Cloud Computing Offer a Viable Option for the Control of Statistical Data: How Safe Are Clouds” Federal Committee for Statistical Methodology (FCSM)
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
TIGGE Archive Highlights. First Service Date ECMWF – October 2006 NCAR – October 2006 CMA – June 2007.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Data for Climate and Energy Studies Steven Worley Computational and Information Systems Laboratory NCAR.
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
Scientific Investigations; Support from Research Data Archives for Joint Office for Science Support 26 February, 2002 Steven Worley SCD/DSS.
Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment.
IODE Ocean Data Portal – from data access to integration platform Sergey Belov, Tobias Spears, Nikolai Mikhailov International Oceanographic Data and Information.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Improved Access to RDA from the MSS OSD Executive Meeting April 28, 2009.
ICOADS: Update Status and Data Distribution Steven J. Worley Scott D. Woodruff Sandra J. Lubker Ziahua Ji J. Eric Freeman NCAR, NOAA/ESRL, NOAA/NCDC CLIMAR-III,
NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
JRA-25 and JCDAS at NCAR Data from Japanese 25-year Reanalysis (JRA-25) and the operational follow- on JMA Climate Data Assimilation System (JCDAS) are.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Future home directories at CERN
TIGGE Data Archive at NCAR 8th GIFS-TIGGE Working Group World Meteorological Organization Geneva February, 2010 Doug Schuster Steven Worley Dave.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
Access Control for NCAR Data Portals A report on work in progress about the future of the NCAR Community Data Portal Luca Cinquini GO-ESSP Workshop, 6-8.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
TIGGE Archive Status at NCAR THORPEX Workshop and 6th GIFS-TIGGE Working Group Meetings WMO Headquarters Geneva September 2008 Steven Worley Doug.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Evolving Scientific Data Workflow CAS 2011 Pamela Gillman
Managing ICT in schools Debbie Wiggins. Due to increased; investment of ICT reliance on ICT for learning, teaching and admin user demands of effective.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
TIGGE Archive Access at NCAR Steven Worley Doug Schuster Dave Stepaniak Hannah Wilcox.
Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
Introduction What purpose does a data archive center serve if users can’t find or access the holdings they might need to facilitate their research discoveries?
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Introduction to Data Management in EGI
TIGGE Data Archive and Access System at NCAR
Introduction to D4Science
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
AWS Cloud Computing Masaki.
Research Data Archives at NCAR
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Steven Worley, Douglas Schuster,
CISL’s Research Data Archive (RDA) : Description and Methods
Comeaux and Worley, NSF/NCAR/SCD
Data Management Components for a Research Data Archive
Robert Dattore and Steven Worley
Successful Data Curation for Large Data Archives
Presentation transcript:

RDA Data Support Section

Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

1. What is it?  Research Data Archive (RDA)  600+ datasets that are significant to many NCAR and University scientists  Archive work began over 40 years ago  Branded as RDA in 2003  Generally, focused on atmospheric and oceanic environmental measurements or analyzed products derived from them  Critical data for weather and climate studies

Who cares? Growth in user access via the web, Promoted with more online data and better interfaces Consistent user access from the MSS Represents provision to NCAR computers 26-year record for filling one-off data requests Decreasing as web increases in recent years Over 6000 Unique Users in 2008

 Rely heavily on CISL infrastructure and experts:  Secure and reliable MSS/HPSS storage  Disk to support web services  Networks to bring data in and distribute out to users  Computing platforms to prepare and serve the RDA  DSS is Geo-science educated; need technical advise/support  Current metrics  Storage:  Primary – 400+ TB, 4+M files  All – 800+ TB (backup/working/etc)  Disk: 40TB on SAN  Servers and laptops  Servers (8) mix of SunOS & Linux  About 12 laptops/desktops  Data movement and growth Why does RDA need CISL?

Complete User Community Pros: -Fast access to online data. -Access to all RDA content metadata. - Access to RDA data. processing services. Complete User Community Cons: -Slow access to offline data. -Have to create a separate RDA account and log in. -Data processing requests take a long time to finish. -Slow download speeds for some users. HPC User Community Pros: -Access to full RDA. -Fast computing. -No login required. HPC User Community Cons: -No access to online data. -Forced to use MSS as a file server: access is too slow -No direct access to RDA metadata. -No direct access to RDA data processing services.

Complete User Community Improvements: -Fast access to full RDA. -Expanded data processing services available. -Faster turnaround on data processing requests. -No need for separate RDA user account. Authenticate through Kerberos? -Faster download speeds (future tools with proper data usage authorization –GRID FTP, etc…). -Consistent “first point of contact” for user support? HPC User Community Improvements: -Fast access to full RDA. -Access to all RDA content metadata. -Access to RDA data processing services. -No need for separate RDA user account. -Consistent “first point of contact” for user support. ?

What is on the horizon?  Transition off all SunOS to Linux  Move SAN storage to GPFS GLADE  Put more data online in GLADE (O 130TB)  Fast access path internal and external  Transition ALL RDA from MSS to HPSS  Implement more on demand products  Data extraction and computing across TB datasets  Must be successful in GLADE, with HPSS, and using a scalable DA compute environment

Questions 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?