Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment.

Slides:



Advertisements
Similar presentations
New Resources in the Research Data Archive Doug Schuster.
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley.
ERA-Interim and ASR Data Management at NCAR
The International Surface Pressure Databank (ISPD) and Twentieth Century Reanalysis at NCAR Thomas Cram - NCAR, Boulder, CO Gilbert Compo & Chesley McColl.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
TIGGE Archive Highlights. First Service Date ECMWF – October 2006 NCAR – October 2006 CMA – June 2007.
Larry Marx and the Project Athena Team. Outline Project Athena Resources Models and Machine Usage Experiments Running Models Initial and Boundary Data.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Data for Climate and Energy Studies Steven Worley Computational and Information Systems Laboratory NCAR.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Identifying Application Impacts on Network Design Designing and Supporting.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Update By: Brian Klug, Li Fan Presentation Overview: API we plan to use (Syntax and commands) Obtainable Data Types (Location, Text, Time, User, Reply)
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
Improved Access to RDA from the MSS OSD Executive Meeting April 28, 2009.
9 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.
NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013
1 Periodic Processes and the cron Daemon The cron daemon is where all timed events are initiated. The cron system is serviced by the cron daemon. What.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
RECENT ENHANCEMENTS TO THE CDDIS IGS Network Systems Workshop November 2-5, 1998 Annapolis, MD Carey E. Noll Manager, CDDIS NASA GSFC Greenbelt, MD.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Cisco Unity Connection Reports Administration Radha Radhakrishnan 10/13/2005.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
TIGGE Archive Status at NCAR THORPEX Workshop and 6th GIFS-TIGGE Working Group Meetings WMO Headquarters Geneva September 2008 Steven Worley Doug.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Evolving Scientific Data Workflow CAS 2011 Pamela Gillman
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
TIGGE Archive Access at NCAR Steven Worley Doug Schuster Dave Stepaniak Hannah Wilcox.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
Research Data Archive (RDA) Access and Services from Yellowstone Grace Peng and Doug Schuster 1.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
15 December 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 15 th December 2000.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
CAA Database Overview Sinéad McCaffrey. Metadata ObservatoryExperiment Instrument Mission Dataset File.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
AOLI 2015 The NMME Experience: A Research Community Archive Lessons learned from Climate Model data archive and use AOLI Meeting 2015 Eric Nienhouse NCAR.
Automation Living in a Paper Oriented World and The Steps to Automation.
The TIGGE Model Validation Portal: An Improvement in Data Interoperability 1 Thomas Cram Doug Schuster Hannah Wilcox Michael Burek Eric Nienhouse Steven.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Introduction What purpose does a data archive center serve if users can’t find or access the holdings they might need to facilitate their research discoveries?
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Example: Rapid Atmospheric Modeling System, ColoState U
Get the Most Out of GoAnywhere: Advanced Workflows
TIGGE Archives and Access
Job workflow Pre production operations:
TIGGE Data Archive and Access System at NCAR
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
CMIP6 use case and adoption of RDA outputs
CISL’s Research Data Archive (RDA) : Description and Methods
Data Management Components for a Research Data Archive
CyberShake Study 2.2: Computational Review Scott Callaghan 1.
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment – Processing adjustments and test Today - starting point for actionable plan – Focus on NWSC DAV, HPC, & CFDS Baseline metrics – 7000 unique users annually – 1.4 PB of primary data – HPSS (2x in total) – 450 TB GLADE, permanent data for users, areas for data preparation – Web servers and DB servers – DSG – Use 6 DAV servers, mirage 0-5 NWSC Planning for RDA – 21 Dec. 2011

Homogeneous architecture and OS Common file system for RDA product development, NCAR access, and connection to DSS web servers – CFDS usage metrics for NCAR users at NWSC? Read/write connectivity to DB servers from Caldera, Geyser, and Yellowstone Dedicated and shared compute resources for user driven workload and burst DSS needs to prepare data – For example: A DSS dedicated system or queues, minimum restrictions? Requirements for RDA Data Processing NWSC

NWSC RDA Systems Structure

Run Research Data Archive Management System (RDAMS) tools and daemons, executed as user “rdadata” – dsarch, archive files from work disk spaces to HPSS and to CFDS – gather-metadata, read all incoming files to verify content, and create metadata records for DBs – dsrqst, manage delayed mode user requests subsetting, process data extraction and re- dimensioning format conversion, e.g. GRIB2 to netCDF file staging, bulk data moves, HPSS file to CFDS /transfer – dsupdt, complex DB governed scripting to regularly download new data, routine growth for 150+ datasets RDA data processing examples and tools

Daemon managed data processing work flow - A system initialized daemon named “dsstart” checks on dsrqst daemon status -A cron job checks on the status of the “dsstart” daemon on each server

Current Scale of Activity System works well and demand is accelerating upward Subsetting, format conversion, file staging – 166 user requests/week – 1-2 hours, average execution time/request – 65 Tb/week, input data volume processed – 3 TB/week, output data volume for users 385 TB data added to RDA in FY 2011 – One case the data processing was too large for mirage servers. Used Lynx, 3-4 weeks, 5-7 concurrent streams