Presentation is loading. Please wait.

Presentation is loading. Please wait.

Haiyan Meng and Douglas Thain

Similar presentations


Presentation on theme: "Haiyan Meng and Douglas Thain"— Presentation transcript:

1 Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids
Haiyan Meng and Douglas Thain University of Notre Dame, Notre Dame, Indiana, USA June 2015

2 Reproducible Computing
Your application works perfectly today on your machine Will your application still work next month? Will your application still work next year? Will your application still work 10 years later? Will your application still work today on another machine? Your application works perfectly well today, Spa Will your application still work next month? Will your application still work next year? Will your application still work 10 years later? Will your application still work on another machine? 11/17/2018

3 Run an Application on a New Machine
Try to run the task on M2 Copy the task and data to M2 h 11/17/2018

4 Possible Failure Reasons on M2
Incompatible Hardware Mismatched Kernel Different Operating System Missing Software Dependencies Wrong Software Version Incorrect Environment Variables Execution environment is incompatible. 11/17/2018

5 Execution Environment Configuration
Three hours Machine 3 Machine 4 Machine 5 Machine 1000 An application should be portable. 11/17/2018

6 VM? Disk Cloning? VM Disk Cloning
If the local machine only misses some input data If the problem just lies in environment variables Expensive The overhead of constructing execution environment should be as low as possible. 11/17/2018

7 Everything is changing!
Rapid changes in the underlying execution environment: Software may have a new version; OS may upgrade; Kernel may upgrade to a new version; Hardware architecture may become obsolete. What will happen 5 years later? 10 years later? How to guarantee your application which works today still work normally in the future? Migration? Simulation? Hardware Preservation? An application should be reproducible. 11/17/2018

8 Portable and Reproducible
How to achieve both portable and reproducible? Specify execution environments for applications Materialize execution environments at runtime automatically Specify execution environments for applications Materialize execution environments at runtime Allow the users to specify execution environments for their applications from hardware all the way up to software and data. Determine the matching degrees between the specification and the execution node and choose the available minimal mechanism to run the applications. Umbrella 11/17/2018

9 Umbrella Specification
Six Sections: Hardware Kernel OS Software Data Environment variables 11/17/2018

10 Workflow of Umbrella 11/17/2018

11 Evaluation of Matching Degrees
11/17/2018

12 Architecture of Umbrella
11/17/2018

13 Remote Archive and Metadata Database
Description Location Fixity http https cvmfs Root Software: pre-built and configured Software Preservation Format: Pre-built and configured 11/17/2018

14 Local Cache - Mounting Mechanism
11/17/2018

15 Sandbox Techniques Application-level Virtualization
Trap system calls of an application and replace the file access path with the desire path Parrot PTU CDE OS-level Virtualization chroot LXC (Linux Container) Docker May need add slides for Parrot, PTU and CDE 11/17/2018

16 Umbrella Specification
Grid Integration Umbrella specification Condor submit file Umbrella Specification Condor Submit File condor_submit condor_wait 11/17/2018

17 Cloud Integration Umbrella on the local machine needs to communicate
With the execution node directly. scp ssh 11/17/2018

18 EC2 Resource Database Derive Amazon EC2 AMI and Instance Type
from Umbrella Specification Replace this picture with the latest pic 11/17/2018

19 CMS Application Input LHE file: 18MB Output: A ROOT file: 64MB Time: ~ 5 minutes 11/17/2018

20 Time Overhead – Local Execution Engine
Sandbox Technique Parrot chroot Docker Matching Evaluation < 1s Software Preparation 2m 11s Sandbox Construction 1m 24s Application Execution 5 min 34s 4m 33s 4m 35s Post Processing 3s Total Time 7m 45s 6m 44s 8m 13s Access Authority any user only root docker group users CPUs: 4 Mem: 2GB Free Disk: 12GB Network: 1Gb/s Arch: X86_64 Kernel: OS: RHEL 7.0 Software: no CMSSW Software OS Kernel Hardware 11/17/2018

21 Space Overhead – Local Execution Engine
Type Description Size Input Specification < 1 KB CMS script OS RHEL 6.5 1.8 GB Software CMSSW 327 MB Parrot 28 MB Data CMS event 18 MB output ROOT file 64 MB Analysis log 2.1 MB 11/17/2018

22 Time Overhead – Cloud and Grid
Subtask – Cloud (EC2) Time Start an EC2 Instance 6s Send Task to VM 2s Remote Execution 6m 40s Post Processing 4s Subtask – Grid (Condor) Time Submit File Construction < 1s Condor Job Submission Remote Execution 6m 20s Post Processing Condor: submit condor job + wait for the results EC2: babysit each step -- find a AMI and instance type, start an EC2 instance -- send task to the instance -- start the remote umbrella command -- wait for the results, pull back the results -- terminate the instance 11/17/2018

23 Umbrella at Scale – ND Condor Pool
Attribute Description Machine number 4157 Hardware Architecture X86_64, i386, i686 Kernel version 25 kernel ( – ) OS Linux, Mac Linux Distribution RHEL, Debian, CentOS RHEL Versions 5.5, 5.9, 5.10, 5.11, 6.4, 6.5, 6.6, 7.0 CPU number 1, 2, 4, 8, 12, 16, 24, 32, 64 Memory Size Max: 1TB Min: 984 MB Disk Size Max: 1.7TB Min: 5GB Docker support 50 out of 4157 CVMFS support 2 out of 4157 Requirements: X86_64 >= Linux RHEL 6.5 1 1GB 4GB CVMFS Needed 165 machines for parrot 25 machines for Docker 11/17/2018

24 Umbrella at Scale – ND Condor Pool
1000 different instances of CMS applications Parrot Docker Type Total Time Fastest Slowest Average Parrot 7158m 4m 12s 11m 53s 7m 09s Docker 8589m 4m 24s 13m 58s 8m 35s 11/17/2018

25 Umbrella at Scale – ND Condor Pool
Attribute Description Machine number 4157 Hardware Architecture X86_64, i386, i686 Kernel version 25 kernel ( – ) Linux Distribution RHEL, Debian, CentOS Docker support 50 out of 4157 CVMFS support 2 out of 4157 Requirements: X86_64 >= RHEL CVMFS Needed Without Umbrella: only 2 machines can be used to run the CMS app 165 machines for parrot 25 machines for Docker With Umbrella: Parrot: ~ 1000 machines can be used. 165 machines actually used. Docker: 50 machines can be used. 25 machines actually used. 11/17/2018

26 Summary: Umbrella Make Applications Portable and Reproducible
Specify the execution environment clearly -- Hardware, Kernel, OS, Software, Data, Environment Variables Materialize the execution environment at runtime automatically -- No need to configure environment manually -- Matching evaluation & choose minimal mechanism Loose-coupled with sandbox techniques: -- Parrot, chroot, VM, Docker Construct sandbox through mounting mechanisms without copying -- multiple namespaces can be constructed concurrently Utilize more computing resources: -- Local Machine, Grid, Cloud Summary: Make Applications Portable and Reproducible 11/17/2018

27 DASPOS (Data and Software Preservation for Open Science): https://daspos.crc.nd.edu
Cooperative Computing Lab Our Lab’s Github Name: Haiyan Meng Questions? 11/17/2018

28 Metadata DB – with id 11/17/2018

29 EC2 Resource DB – with id 11/17/2018


Download ppt "Haiyan Meng and Douglas Thain"

Similar presentations


Ads by Google