Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill.

Slides:



Advertisements
Similar presentations
ExoGENI Rack Architecture Ilia Baldine Jeff Chase Chris Heermann Brad Viviano
Advertisements

Sponsored by the National Science Foundation GENI Alpha Demonstration Nowcasting: UMass/CASA Weather Radar Demonstration David Irwin November 3, 2010
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
GEC21 Experimenter/Developer Roundtable (Experimenter) Paul Ruth RENCI / UNC Chapel Hill
Building on the BIRN Workshop BIRN Systems Architecture Overview Philip Papadopoulos – BIRN CC, Systems Architect.
1 GENI: Global Environment for Network Innovations Jennifer Rexford Princeton University
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
Ilya Baldin 2.
Customized cloud platform for computing on your terms !
National Science Foundation Arlington, Virginia January 7-8, 2013 Tom Lehman University of Maryland Mid-Atlantic Crossroads.
Sponsored by the National Science Foundation Nowcasting: UMass/CASA Weather Radar Demonstration Michael Zink CC-NIE Workshop January 7, 2013.
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
Sponsored by the National Science Foundation GENI and Cloud Computing Niky RIga GENI Project Office
DISTRIBUTED COMPUTING
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Software-defined Networking Capabilities, Needs in GENI for VMLab ( Prasad Calyam; Sudharsan Rajagopalan;
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Sponsored by the National Science Foundation GENI Exploring Networks of the Future
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
SLIDE 1DID Meeting - Montreal Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California,
FutureGrid Cyberinfrastructure for Computational Research.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
GIMI I&M and Monitoring Mike Zink University of Massachusetts Amherst GEC 15, Houston, October 23 rd 1.
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ilia Baldine, Jeff Chase, Mike Zink, Max Ott.  14 GPO-funded racks ◦ Partnership between RENCI, Duke and IBM ◦ IBM x3650 M3/M4 servers  1x146GB 10K.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Spectrum of Support for Data Movement and Analysis in Big Data Science Network Management and Control E-Center & ESCPS Network Management and Control E-Center.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
7. Grid Computing Systems and Resource Management
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
GIMI Tutorial GIMI Team GEC 16, Salt Lake City, March 19 th 1.
1 Porting applications to the NGS, using the P-GRADE portal and GEMLCA Peter Kacsuk MTA SZTAKI Hungarian Academy of Sciences Centre for.
LEAD Project Discussion Presented by: Emma Buneci for CPS 296.2: Self-Managing Systems Source for many slides: Kelvin Droegemeier, Year 2 site visit presentation.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
1 Network related topics Bartosz Belter, Wojbor Bogacki, Marcin Garstka, Maciej Głowiak, Radosław Krzywania, Roman Łapacz FABRIC meeting Poznań, 25 September.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
ExoGENI GENI Going Forward Tasks Ilya Baldin RENCI Director for Network Research and Infrastructure.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Towards an integrated multimedia service hosting overlay Dongyan Xu Xuxian Jiang Proceedings of the 12th annual ACM international conference on Multimedia.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
GIMI Update Mike Zink University of Massachusetts Amherst GEC 14, Boston, July 9 th 1.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Nowcasting: UMass/CASA Weather Radar Demonstration David Irwin
Landsat Remote Sensing Workflow
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Geoffrey Fox, Shantenu Jha, Dan Katz, Judy Qiu, Jon Weissman
Enable computational and experimental  scientists to do “more” computational chemistry by providing capability  computing resources and services at their.
Grid Computing.
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
GGF15 – Grids and Network Virtualization
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
NSF cloud Chameleon: Phase 2 Networking
GENI Exploring Networks of the Future
VIFI : Virtual Information Fabric for Data-Driven Discovery from Distributed Fragmented Repositories PI: Dr. Ashit Talukder Bank of America Endowed Chair.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Networked Clouds Cloud and Network Providers Observatory Wind tunnel Science Workflows

ExoGENI Testbed

Computational/Data Science Projects on ExoGENI ADAMANT – Building tools for enabling workflow- based scientific applications on dynamic infrastructure (RENCI, Duke, USC/ISI) RADII – Building tools for supporting collaborative data-driven science (RENCI) GENI ScienceShakedown – ADCIRC storm surge modeling on GENI Goal of presentation to demonstrate some of the things that are possible with GENI today 4

ADAMANT Presentation title goes here5

Scientific Workflows – Dynamic Use Case Presentation title goes here6

CC-NIE ADAMANT – Pegasus/ExoGENI 7 Network Infrastructure-as-a-Service (NIaaS) for workflow- driven applications — Tools for workflows integrated with adaptive infrastructure Workflows triggering adaptive infrastructure — Pegasus workflows using ExoGENI — Adapt to application demands (compute, network, storage) — Integrate data movement into NIaaS (on-ramps) Target applications — Montage Galactic plane ensemble: Astronomy mosaics — Genomics: High-Throughput Sequencing

8 ExoGENI: Enabling Features for Workflows On-Ramps / Stitchports — Connect ExoGENI to existing static infrastructure to import/export Storage slivering — Networked storage: iSCSI target on dataplane — Neuca tools attach lun, format and mount filesystem Inter-domain links, multipoint broadcast networks

Computational workflows in Genomics Several versions as we scaled: Single machine Cluster based MapSeq: specialized code & Condor Pegasus & Condor RNA-Seq WGS

10 VM Cloud providers (compute, data) Goal: learning to use NIaaS for biomedical research VM Slice 1 VM Slice 2 VM User or workflow provisioned & isolated slices VM Network providers

Goal: Management of data flows in NIaaS RENCI UNC iRODS Data Grid iCAT RE VM Slice 2 VM Layer 2 connection within the slice Metadata control Lab X can compute on Project Y data in the cloud User X can move data from Study A to the cloud Data from Study W cannot remain on cloud resources Ease of access Control over access Auditing Provenance

12 Example ExoGENI requests auto-generated

Application to NIaaS - Architecture

RADII Presentation title goes here14

RADII RADII: Resource Aware Data-centric Collaboration Infrastructure –Middleware to facilitate data-driven collaborations for domain researchers and a commodity to the science community –Reducing the large gap between procuring the required infrastructure and manage data transfers efficiently Integration of data-grid (iRODS) and NIaaS (ORCA) technologies on ExoGENI infrastructure –Novel tools to map data processes, computations, storage and organization entities onto infrastructure with intuitive GUI based application –Novel data-centric resource management mechanisms for provisioning and de-provisioning resources dynamically through out the lifecycle of collaborations

Why iRODS in RADII? –RADII Policies to iRODS Rule Language Easy to map policies to iRODS Dynamic PEP Reduced complexity for RADII –Distributed and Elastic Data Grid –Resource Monitoring Framework –Geo-aware Resource hierarchy creation via composable iRODS –Metadata tagging

Resource Awareness iRODS RMS provides node specific resource utilization End-to-End parameters such as throughput, current network flow is important for judicious placement, replication and retrieval decision Created end-to-end Throughput, Latency and instantaneous transfer RX/TX per second monitoring. The best server selection based on end-to-end utility value:

Experiment Topology Figure: Experimental Setup Topology

Experimental Setup The sites were : UCD, SL, UH, FIU Parallel and multithreaded file ingestion from each of the clients Total 400GB file ingestion from each client One copy at the edge node and another replication based on utile value.

Edge Put and Remote Replication Time Figure: Edge Node Put Time Figure: Remote Replication Time

ScienceShakedown Presentation title goes here21

Motivation Hurricane Sandy (2012)

Motivation Real-time, on-demand computations of storm surge impacts Hazards to coastal areas a major concern Hazard/Threat Information needed ASAP (Urgently) Critical need for: – detailed  high spatial resolution  large compute resources Federal Forecast cycle every 6 hrs Must be well within Cycle to be relevant/useful I.e., New information at 5:59 is already old!!!

Computing Storm Surge ADCIRC Storm Surge Model –FEMA-approved for Coastal Flood Insurance Studies –Very high spatial resolution (millions of triangles) –Typically use cores for real-time (one simulation!) ADCIRC grid for coastal North Carolina

Tackling Uncertainty Research Ensemble NSF Hazards SEES project 22 members, H. Floyd (1999) One simulation is NOT enough! Probabilistic Assessment of Hurricanes A “few” likely hurricanes Fully dynamic atmosphere (WRF)

Why GENI? Current limitations: Real-time demands for compute resource –Large demands for real-time compute resources during storms –Not enough demand to dedicate a cluster year-round

Why GENI? Current limitations: Real-time demands for compute resource –Large demands for real-time compute resources during storms –Not enough demand to dedicate a cluster year-round GENI enables –Federation of resources –Cloud bursting, urgent, on-demand –High-speed data transfers to/from/between remote resources –Replicate data/compute across geographic areas Resiliency, performance

Storm Surge Workflow Parallel task (32 Core MPI) Each ensemble member is a high-performance parallel task that calculates one storm Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core

Slice Topology 11 GENI sites (1 ensemble manager, 10 compute sites) Topology: 92 VMs (368 cores), 10 inter-domain VLANs, 1 TB iSCSI storage HPC compute nodes: 80 compute nodes (320 cores) from 10 sites

ADCIRC Results from GENI Storm Surge for 6 simulations N11 N17 N01 N14 N16 N20 Small Threat Big Threat

Conclusions GENI testbed represents a kind of shared infrastructure suitable for prototyping of solutions for some computational science domains GENI technologies represent a collection of enabling mechanisms that can provide foundation for the future federated science cyberinfrastructure Different members of GENI federations offer different capabilities for their users, suitable for a variety of problems 31

Thank you! Funders Partners 32