Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill.

Similar presentations


Presentation on theme: "Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill."— Presentation transcript:

1 Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

2 Networked Clouds Cloud and Network Providers Observatory Wind tunnel Science Workflows

3 ExoGENI Testbed

4 Computational/Data Science Projects on ExoGENI ADAMANT – Building tools for enabling workflow- based scientific applications on dynamic infrastructure (RENCI, Duke, USC/ISI) RADII – Building tools for supporting collaborative data-driven science (RENCI) GENI ScienceShakedown – ADCIRC storm surge modeling on GENI Goal of presentation to demonstrate some of the things that are possible with GENI today 4

5 ADAMANT Presentation title goes here5

6 Scientific Workflows – Dynamic Use Case Presentation title goes here6

7 CC-NIE ADAMANT – Pegasus/ExoGENI 7 Network Infrastructure-as-a-Service (NIaaS) for workflow- driven applications — Tools for workflows integrated with adaptive infrastructure Workflows triggering adaptive infrastructure — Pegasus workflows using ExoGENI — Adapt to application demands (compute, network, storage) — Integrate data movement into NIaaS (on-ramps) Target applications — Montage Galactic plane ensemble: Astronomy mosaics — Genomics: High-Throughput Sequencing

8 8 ExoGENI: Enabling Features for Workflows On-Ramps / Stitchports — Connect ExoGENI to existing static infrastructure to import/export Storage slivering — Networked storage: iSCSI target on dataplane — Neuca tools attach lun, format and mount filesystem Inter-domain links, multipoint broadcast networks

9 Computational workflows in Genomics Several versions as we scaled: Single machine Cluster based MapSeq: specialized code & Condor Pegasus & Condor RNA-Seq WGS

10 10 VM Cloud providers (compute, data) Goal: learning to use NIaaS for biomedical research VM Slice 1 VM Slice 2 VM User or workflow provisioned & isolated slices VM Network providers

11 Goal: Management of data flows in NIaaS RENCI UNC iRODS Data Grid iCAT RE VM Slice 2 VM Layer 2 connection within the slice Metadata control Lab X can compute on Project Y data in the cloud User X can move data from Study A to the cloud Data from Study W cannot remain on cloud resources Ease of access Control over access Auditing Provenance

12 12 Example ExoGENI requests auto-generated

13 Application to NIaaS - Architecture

14 RADII Presentation title goes here14

15 RADII RADII: Resource Aware Data-centric Collaboration Infrastructure –Middleware to facilitate data-driven collaborations for domain researchers and a commodity to the science community –Reducing the large gap between procuring the required infrastructure and manage data transfers efficiently Integration of data-grid (iRODS) and NIaaS (ORCA) technologies on ExoGENI infrastructure –Novel tools to map data processes, computations, storage and organization entities onto infrastructure with intuitive GUI based application –Novel data-centric resource management mechanisms for provisioning and de-provisioning resources dynamically through out the lifecycle of collaborations

16 Why iRODS in RADII? –RADII Policies to iRODS Rule Language Easy to map policies to iRODS Dynamic PEP Reduced complexity for RADII –Distributed and Elastic Data Grid –Resource Monitoring Framework –Geo-aware Resource hierarchy creation via composable iRODS –Metadata tagging

17 Resource Awareness iRODS RMS provides node specific resource utilization End-to-End parameters such as throughput, current network flow is important for judicious placement, replication and retrieval decision Created end-to-end Throughput, Latency and instantaneous transfer RX/TX per second monitoring. The best server selection based on end-to-end utility value:

18 Experiment Topology Figure: Experimental Setup Topology

19 Experimental Setup The sites were : UCD, SL, UH, FIU Parallel and multithreaded file ingestion from each of the clients Total 400GB file ingestion from each client One copy at the edge node and another replication based on utile value.

20 Edge Put and Remote Replication Time Figure: Edge Node Put Time Figure: Remote Replication Time

21 ScienceShakedown Presentation title goes here21

22 Motivation Hurricane Sandy (2012)

23 Motivation Real-time, on-demand computations of storm surge impacts Hazards to coastal areas a major concern Hazard/Threat Information needed ASAP (Urgently) Critical need for: – detailed  high spatial resolution  large compute resources Federal Forecast cycle every 6 hrs Must be well within Cycle to be relevant/useful I.e., New information at 5:59 is already old!!!

24 Computing Storm Surge ADCIRC Storm Surge Model –FEMA-approved for Coastal Flood Insurance Studies –Very high spatial resolution (millions of triangles) –Typically use 256-1024 cores for real-time (one simulation!) ADCIRC grid for coastal North Carolina

25 Tackling Uncertainty Research Ensemble NSF Hazards SEES project 22 members, H. Floyd (1999) One simulation is NOT enough! Probabilistic Assessment of Hurricanes A “few” likely hurricanes Fully dynamic atmosphere (WRF)

26 Why GENI? Current limitations: Real-time demands for compute resource –Large demands for real-time compute resources during storms –Not enough demand to dedicate a cluster year-round

27 Why GENI? Current limitations: Real-time demands for compute resource –Large demands for real-time compute resources during storms –Not enough demand to dedicate a cluster year-round GENI enables –Federation of resources –Cloud bursting, urgent, on-demand –High-speed data transfers to/from/between remote resources –Replicate data/compute across geographic areas Resiliency, performance

28 Storm Surge Workflow Parallel task (32 Core MPI) Each ensemble member is a high-performance parallel task that calculates one storm Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core Compute Core

29 Slice Topology 11 GENI sites (1 ensemble manager, 10 compute sites) Topology: 92 VMs (368 cores), 10 inter-domain VLANs, 1 TB iSCSI storage HPC compute nodes: 80 compute nodes (320 cores) from 10 sites

30 ADCIRC Results from GENI Storm Surge for 6 simulations N11 N17 N01 N14 N16 N20 Small Threat Big Threat

31 Conclusions GENI testbed represents a kind of shared infrastructure suitable for prototyping of solutions for some computational science domains GENI technologies represent a collection of enabling mechanisms that can provide foundation for the future federated science cyberinfrastructure Different members of GENI federations offer different capabilities for their users, suitable for a variety of problems 31

32 Thank you! Funders Partners 32


Download ppt "Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill."

Similar presentations


Ads by Google