Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS Cloud Operations

Similar presentations


Presentation on theme: "ATLAS Cloud Operations"— Presentation transcript:

1 ATLAS Cloud Operations
Frank Berghaus (University of Victoria) On behalf of the ATLAS Cloud Computing Group

2 ATLAS Facilities Jamboree
Overview Review of the Cloud Scheduler System Worker Node Virtual Machines Squid Discovery with Shoal Recommendations and procedure for adding clouds Dec 4, 2014 ATLAS Facilities Jamboree

3 Cloud Job Flow (on the Grid)
User Job User Panda Queue Virtual Machine Pilot Job Boot Instances Pilot Factory HTCondor Clouds Cloud Scheduler Easy to connect and use many clouds Dec 4, 2014 ATLAS Facilities Jamboree

4 CernVM3Worker Nodes Features:
Operating system and project software is made available over cvmfs Cloud-init contextualize images on boot Same image works anywhere, on any Hypervisor, and on any Cloud Type Dynamic condor slot configuration Dec 4, 2014 ATLAS Facilities Jamboree

5 Shoal: Dynamic Squid Discovery
Ready for larger scale deployment: installation instructions Current server: Connected squids: UVic, TRIUMF, Oxford, CERN Cloud Included with CernVM since release 3.2 Meets the requirements of the squid discovery task force Dec 4, 2014 ATLAS Facilities Jamboree

6 Recommendations for New Clouds
Reference from the ATLAS Cloud Ops wiki: AtlasCloudSiteGuide Use OpenStack Working on StratusLab as an option Use CernVM Install shoal agent on your local squid Test booting an connecting to an instance Test your network connectivity to the CERN cloud scheduler: aiatlas009.cern.ch Expose your cloud interface to allow requests from the cloud scheduler Create a service account for the ATLAS cloud scheduler the Cloud Ops Team: Dec 4, 2014 ATLAS Facilities Jamboree

7 Procedure for Adding a New Site
Once the cloud team has the credentials and endpoint to the new cloud we will: Create a panda queue or site to associate with your cloud Add single, multi-core, analysis, and/or high memory queues Run a set of Hammer Cloud tests Start running jobs Dec 4, 2014 ATLAS Facilities Jamboree

8 ATLAS Facilities Jamboree
Summary & Outlook ATLAS Production and Analysis is running on IaaS clouds Analysis usage limited because limited manpower Ready to add more cloud sites Dec 4, 2014 ATLAS Facilities Jamboree

9 ATLAS Facilities Jamboree
Backup Dec 4, 2014 ATLAS Facilities Jamboree

10 Infrastructure-as-a-Service (IaaS) Clouds
IaaS Cloud: A pool of virtual machine hypervisors presenting a single controller interface Run many instances of one virtual machine configured for ATLAS computing Advantages: Isolate complex application software from site administration Minimize dependence on local system Flexible resource allocation Examples: OpenStack Nimbus Commercial clouds: Amazon, Google, etc. Running at labs (e.g., CERN), universities (e.g., Victoria), and research networks (e.g., GridPP) Dec 4, 2014 ATLAS Facilities Jamboree

11 Clouds boot Virtual Machines
Cloud Scheduler Cloud Scheduler Cloud Interface Cloud Interface Cloud Interface ... Google Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine OpenStack OpenStack Condor Central Manager Clouds boot Virtual Machines VM contextualizes to attach to the condor and processes jobs Cloud scheduler retires VM when no jobs require that VM User Scheduler status communication Dec 4, 2014 ATLAS Facilities Jamboree

12 ATLAS Cloud Production in 2014
Helix Nebula Google Amazon Over 1.2M ATLAS Jobs Completed Mostly Single core production New ATLAS requirements: Multi-core High memory Completed Jobs in 2014 1.2M CERN North America Australia UK Jan 2014 Sep 2014 Dec 4, 2014 ATLAS Facilities Jamboree


Download ppt "ATLAS Cloud Operations"

Similar presentations


Ads by Google