Evolving Scientific Data Workflow CAS 2011 Pamela Gillman

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing P. Balaji, Argonne National Laboratory W. Feng and J. Archuleta, Virginia Tech.
National Center for Atmospheric Research John Clyne 4/27/11 4/26/20111.
Science Gateways on the TeraGrid Von Welch, NCSA (with thanks to Nancy Wilkins-Diehr, SDSC for many slides)
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
NCAR Science Day April 17, 2015 Shawn Strande High-end Services Section.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Kian-Tat Lim Offline Computing November 12 th, LCLS Offline Data Management.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
National Institute of Standards and Technology Computer Security Division Information Technology Laboratory Threat Information Sharing; Perspectives, Strategies,
Simo Niskala Teemu Pasanen
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D The Case for Monitoring and Testing David.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
NCAR Annual Budget Review October 8, 2007 Tim Killeen NCAR Director.
Igor Gaponenko ( On behalf of LCLS / PCDS ).  An integral part of the LCLS Computing System  Provides:  Mid-term (1 year) storage for experimental.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
Large Scale Parallel File System and Cluster Management ICT, CAS.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Near Real-Time Verification At The Forecast Systems Laboratory: An Operational Perspective Michael P. Kay (CIRES/FSL/NOAA) Jennifer L. Mahoney (FSL/NOAA)
1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public and Grid Computing.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
LLNL’s Data Center and Interoperable Services 5 th Annual ESGF Face-to-Face Conference ESGF 2015 Monterey, CA, USA Dean N. Williams, Tony Hoang, Cameron.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
NCAR RP Update Rich Loft NCAR RPPI May 7, NCAR Teragrid RP Developments Current Cyberinfrastructure –5.7 TFlops/2048 core Blue Gene/L system –100.
NOAA R&D High Performance Computing Colin Morgan, CISSP High Performance Technologies Inc (HPTI) National Oceanic and Atmospheric Administration Geophysical.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.
ORNL Site Report ESCC Feb 25, 2014 Susan Hicks. 2 Optical Upgrades.
1 MSWG, Amsterdam, December 15, 2005 DEISA security Jules Wolfrat SARA.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Large Scale Test of a storage solution based on an Industry Standard
Kirill Lozinskiy NERSC Storage Systems Group
Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut Status Report PSI PaNDaaS2 meeting Grenoble 6 – 7 July 2016.
SDM workshop Strawman report History and Progress and Goal.
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Data Management Components for a Research Data Archive
Successful Data Curation for Large Data Archives
Presentation transcript:

Evolving Scientific Data Workflow CAS 2011 Pamela Gillman

CAS 2011 Computational and Information Systems Laboratory Overview Traditional Data Workflow Evolving Scientific Data Workflow Design Technical Challenges GLobally Accessible Data Environment New Workflow Example NWSC Steps Forward

CAS 2011 Computational and Information Systems Laboratory Traditional Workflow Process Centric Data Model

CAS 2011 Computational and Information Systems Laboratory Traditional Data Workflow Challenges Common data movement issues Time consuming to move data between systems Bandwidth to archive system is insufficient Lack of sufficient disk space Need to evolve data management techniques Workflow management systems Standardize metadata information User Education Effective methods for understanding workflow Effective methods for streamlining workflow

CAS 2011 Computational and Information Systems Laboratory Evolving Scientific Workflow Information Centric Data Model

CAS 2011 Computational and Information Systems Laboratory Design Technical Challenges Determining actual workflow behaviors chicken and the egg problem current environment potentially shapes behavior change the environment, does behavior change Storage cost curves are steeper than compute cost curves Finding the right balance Archive cost curve is unsustainable Need a better balance between disk and archive use

CAS 2011 Computational and Information Systems Laboratory GLADE GLobally Accessible Data Environment Unified and consistent data environment for NCAR HPC Supercomputers, DAV, and storage Shared transfer interface and support for projects Support for analysis of IPCC AR5 data Service Gateways for ESG & RDA data sets Data is available at high bandwidth to any server or supercomputer within the GLADE environment Resources outside the environment can manipulate data using common interfaces Choice of interfaces supports current projects; platform is flexible to support future projects

CAS 2011 Computational and Information Systems Laboratory GLADE Data Workflow Solutions Information centric Data can stay in place through entire workflow Access from supercomputing, data post- processing, analysis and visualization resources Direct access to NCAR data collections Availability of persistent longer-term storage Allows completion of entire workflow prior to final storage of results either at NCAR or offsite Provides high-bandwidth data transfer services between NCAR and peer institutions

CAS 2011 Computational and Information Systems Laboratory New Workflow Example Data Transfer Gateway Science Gateways Data Analysis Visualization Data Analysis Visualization Supercomputers scratch Project Space Project Space Data Collecti on Data Collecti on GLADE HPSS RDA/ESG GridFTP scp / sftp bbcp hsi htar GridFTP

CAS 2011 Computational and Information Systems Laboratory Scale of Data Environment Changing Current NCAR Data Scale HPC Scratch and DAV Space: 1 PB Data Collection Space: 1 PB Archive Size: 14 PB HPC System: 77 Teraflops NWSC Scale Projections Global File System: 10-15PB ~80 GB/s burst I/O rate Archive Size: 20PB initial growing to >170PB By 2016 HPC System: ~1.5 Petaflops

CAS 2011 Computational and Information Systems Laboratory NWSC Conceptual Data Architecture Data Transfer Services Storage Cluster 15 PB 80GB/s burst Science Gateways RDA, ESG Data Analysis, Visualization and Computational Clusters HPSS 170 PB High Bandwidth I/O Network (Infiinband) 10Gb/40Gb Ethernet Data Collections Project Spaces Scratch Archive Interface Partner SitesTeraGrid SitesRemote Vis 10Gb/40Gb/100Gb Ethernet

CAS 2011 Computational and Information Systems Laboratory Summary Exciting times for Data-intensive Science! Many unknowns at this scale, but We’re working to prepare as much as possible Risk Mitigation is in the forefront mid-course corrections based on current efforts tools for observing changes in workflow behaviors phased procurement options Preparing users between now and NWSC deployment Allocation, charging enhancements New workflow strategies

CAS 2011 Computational and Information Systems Laboratory QUESTIONS?