Download presentation
Presentation is loading. Please wait.
Published byKaren Page Modified over 9 years ago
1
Evolving Scientific Data Workflow CAS 2011 Pamela Gillman pjg@ucar.edu
2
CAS 2011 Computational and Information Systems Laboratory Overview Traditional Data Workflow Evolving Scientific Data Workflow Design Technical Challenges GLobally Accessible Data Environment New Workflow Example NWSC Steps Forward
3
CAS 2011 Computational and Information Systems Laboratory Traditional Workflow Process Centric Data Model
4
CAS 2011 Computational and Information Systems Laboratory Traditional Data Workflow Challenges Common data movement issues Time consuming to move data between systems Bandwidth to archive system is insufficient Lack of sufficient disk space Need to evolve data management techniques Workflow management systems Standardize metadata information User Education Effective methods for understanding workflow Effective methods for streamlining workflow
5
CAS 2011 Computational and Information Systems Laboratory Evolving Scientific Workflow Information Centric Data Model
6
CAS 2011 Computational and Information Systems Laboratory Design Technical Challenges Determining actual workflow behaviors chicken and the egg problem current environment potentially shapes behavior change the environment, does behavior change Storage cost curves are steeper than compute cost curves Finding the right balance Archive cost curve is unsustainable Need a better balance between disk and archive use
7
CAS 2011 Computational and Information Systems Laboratory GLADE GLobally Accessible Data Environment Unified and consistent data environment for NCAR HPC Supercomputers, DAV, and storage Shared transfer interface and support for projects Support for analysis of IPCC AR5 data Service Gateways for ESG & RDA data sets Data is available at high bandwidth to any server or supercomputer within the GLADE environment Resources outside the environment can manipulate data using common interfaces Choice of interfaces supports current projects; platform is flexible to support future projects
8
CAS 2011 Computational and Information Systems Laboratory GLADE Data Workflow Solutions Information centric Data can stay in place through entire workflow Access from supercomputing, data post- processing, analysis and visualization resources Direct access to NCAR data collections Availability of persistent longer-term storage Allows completion of entire workflow prior to final storage of results either at NCAR or offsite Provides high-bandwidth data transfer services between NCAR and peer institutions
9
CAS 2011 Computational and Information Systems Laboratory New Workflow Example Data Transfer Gateway Science Gateways Data Analysis Visualization Data Analysis Visualization Supercomputers scratch Project Space Project Space Data Collecti on Data Collecti on GLADE HPSS RDA/ESG GridFTP scp / sftp bbcp hsi htar GridFTP
10
CAS 2011 Computational and Information Systems Laboratory Scale of Data Environment Changing Current NCAR Data Scale HPC Scratch and DAV Space: 1 PB Data Collection Space: 1 PB Archive Size: 14 PB HPC System: 77 Teraflops NWSC Scale Projections Global File System: 10-15PB ~80 GB/s burst I/O rate Archive Size: 20PB initial growing to >170PB By 2016 HPC System: ~1.5 Petaflops
11
CAS 2011 Computational and Information Systems Laboratory NWSC Conceptual Data Architecture Data Transfer Services Storage Cluster 15 PB 80GB/s burst Science Gateways RDA, ESG Data Analysis, Visualization and Computational Clusters HPSS 170 PB High Bandwidth I/O Network (Infiinband) 10Gb/40Gb Ethernet Data Collections Project Spaces Scratch Archive Interface Partner SitesTeraGrid SitesRemote Vis 10Gb/40Gb/100Gb Ethernet
12
CAS 2011 Computational and Information Systems Laboratory Summary Exciting times for Data-intensive Science! Many unknowns at this scale, but We’re working to prepare as much as possible Risk Mitigation is in the forefront mid-course corrections based on current efforts tools for observing changes in workflow behaviors phased procurement options Preparing users between now and NWSC deployment Allocation, charging enhancements New workflow strategies
13
CAS 2011 Computational and Information Systems Laboratory QUESTIONS? pjg@ucar.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.