Enabling Scientific Workflows on FermiCloud using OpenNebula Steven Timm Grid & Cloud Services Department Fermilab Work supported by the U.S. Department.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Fermilab, the Grid & Cloud Computing Department and the KISTI Collaboration GSDC - KISTI Workshop Jangsan-ri, Nov 4, 2011 Gabriele Garzoglio Grid & Cloud.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Minerva Infrastructure Meeting – October 04, 2011.
Assessment of Core Services provided to USLHC by OSG.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
DISTRIBUTED COMPUTING
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Fermilab Site Report Spring 2012 HEPiX Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
The LBNL Perceus Cluster Infrastructure Next Generation Cluster Provisioning and Management October 10, 2007 Internet2 Fall Conference Gary Jung, SCS Project.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
FermiCloud: Enabling Scientific Workflows with Federation and Interoperability Steven C. Timm FermiCloud Project Lead Grid & Cloud Computing Department.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Web Technologies Lecture 13 Introduction to cloud computing.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
EGI-InSPIRE RI EGI Webinar EGI-InSPIRE RI Porting your application to the EGI Federated Cloud 17 Feb
FermiCloud Project Overview and Progress Keith Chadwick Grid & Cloud Computing Department Head Fermilab Work supported by the U.S. Department of Energy.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Auxiliary services Web page Secrets repository RSV Nagios Monitoring Ganglia NIS server Syslog Forward FermiCloud: A private cloud to support Fermilab.
Fermilab / FermiGrid / FermiCloud Security Update Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359 Keith Chadwick Grid.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
The FermiCloud Infrastructure- as-a-Service Facility Steven C. Timm FermiCloud Project Leader Grid & Cloud Computing Department Fermilab Work supported.
Authentication, Authorization, and Contextualization in FermiCloud S. Timm, D. Yocum, F. Lowe, K. Chadwick, G. Garzoglio, D. Strain, D. Dykstra, T. Hesselroth.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Enabling Scientific Workflows on FermiCloud using OpenNebula Steven Timm Grid & Cloud Services Department Fermilab Work supported by the U.S. Department.
FermiCloud Project: Integration, Development, Futures Gabriele Garzoglio Associate Head Grid & Cloud Computing Department Fermilab Work supported by the.
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Hao Wu, Shangping Ren, Gabriele Garzoglio, Steven Timm, Gerard Bernabeu, Hyun Woo Kim, Keith Chadwick, Seo-Young Noh A Reference Model for Virtual Machine.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
FermiCloud Update ISGC 2013 Keith Chadwick Grid & Cloud Computing Department Fermilab Work supported by the U.S. Department of Energy under contract No.
GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
High Throughput & Resilient Fabric Deployments on FermiCloud Steven C. Timm Grid & Cloud Services Department Fermilab Work supported by the U.S. Department.
Introduction to Distributed Platforms
StratusLab Final Periodic Review
StratusLab Final Periodic Review
f f FermiGrid – Site AuthoriZation (SAZ) Service
Grid Computing.
WLCG Collaboration Workshop;
Presentation transcript:

Enabling Scientific Workflows on FermiCloud using OpenNebula Steven Timm Grid & Cloud Services Department Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359

Outline Introduction—Fermilab and Scientific Computing FermiCloud Project and Drivers Applying Grid Lessons To Cloud FermiCloud Project Current and Future Interoperability Reframing the Cloud Discussion 25-Sep-2013S. Timm, OpenNebulaConf1

Fermilab and Scientific Computing Fermi National Accelerator Laboratory: Lead United States particle physics laboratory ~60 PB of data on tape High Throughput Computing characterized by: “Pleasingly parallel” tasks High CPU instruction / Bytes IO ratio But still lots of I/O. See Pfister: “In Search of Clusters” 25-Sep-2013S. Timm, OpenNebulaConf2

Grid and Cloud Services Dept. Operations: Grid Authorization Grid Accounting Computing Elements Batch Submission All require high availability All require multiple integration systems to test. Also requires virtualization And login as root Solutions: Development of authorization, accounting, and batch submission software Packaging and integration Requires development machines not used all the time Plus environments that are easily reset And login as root 25-Sep-2013S. Timm, OpenNebulaConf3

HTC Virtualization Drivers Large multi-core servers have evolved from 2 to 64 cores per box, A single “rogue” user/application can impact 63 other users/applications. Virtualization can provide mechanisms to securely isolate users/applications. Typical “bare metal” hardware has significantly more performance than usually needed for a single-purpose server, Virtualization can provide mechanisms to harvest/utilize the remaining cycles. Complicated software stacks are difficult to distribute on grid, Distribution of preconfigured virtual machines together with GlideinWMS and HTCondor can aid in addressing this problem. Large demand for transient development/testing/integration work, Virtual machines are ideal for this work. Science is increasingly turning to complex, multiphase workflows. Virtualization coupled with cloud can provide the ability to flexibly reconfigure hardware “on demand” to meet the changing needs of science. Legacy code: Data and code preservation for recently-completed experiments at Fermilab Tevatron and elsewhere. Burst Capacity: Systems are full all the time, need more cycles just before conferences. 25-Sep-2013S. Timm, OpenNebulaConf4

FermiCloud – Initial Project Specifications FermiCloud Project was established in 2009 with the goal of developing and establishing Scientific Cloud capabilities for the Fermilab Scientific Program, Building on the very successful FermiGrid program that supports the full Fermilab user community and makes significant contributions as members of the Open Science Grid Consortium. Reuse High Availabilty, AuthZ/AuthN, Virtualization from Grid In a (very) broad brush, the mission of the FermiCloud project is: To deploy a production quality Infrastructure as a Service (IaaS) Cloud Computing capability in support of the Fermilab Scientific Program. To support additional IaaS, PaaS and SaaS Cloud Computing capabilities based on the FermiCloud infrastructure at Fermilab. The FermiCloud project is a program of work that is split over several overlapping phases. Each phase builds on the capabilities delivered as part of the previous phases. 25-Sep-2013S. Timm, OpenNebulaConf5

Overlapping Phases 25-Sep-2013S. Timm, OpenNebulaConf6 Phase 1: “Build and Deploy the Infrastructure” Phase 1: “Build and Deploy the Infrastructure” Phase 2: “Deploy Management Services, Extend the Infrastructure and Research Capabilities” Phase 2: “Deploy Management Services, Extend the Infrastructure and Research Capabilities” Phase 3: “Establish Production Services and Evolve System Capabilities in Response to User Needs & Requests” Phase 3: “Establish Production Services and Evolve System Capabilities in Response to User Needs & Requests” Phase 4: “Expand the service capabilities to serve more of our user communities” Phase 4: “Expand the service capabilities to serve more of our user communities” Time Today

Current FermiCloud Capabilities The current FermiCloud hardware capabilities include: Public network access via the high performance Fermilab network, -This is a distributed, redundant network. Private 1 Gb/sec network, -This network is bridged across FCC and GCC on private fiber, High performance Infiniband network, -Currently split into two segments, Access to a high performance FibreChannel based SAN, -This SAN spans both buildings. Access to the high performance BlueArc based filesystems, -The BlueArc is located on FCC-2, Access to the Fermilab dCache and enStore services, -These services are split across FCC and GCC, Access to 100 Gbit Ethernet test bed in LCC (Integration nodes), -Intel 10 Gbit Ethernet converged network adapter X540-T1. 25-Sep-2013S. Timm, OpenNebulaConf7

Typical Use Cases Public net virtual machine: On Fermilab Network open to Internet, Can access dCache and Bluearc Mass Storage, Common home directory between multiple VM’s. Public/Private Cluster: One gateway VM on public/private net, Cluster of many VM’s on private net. Data acquisition simulation Storage VM: VM with large non-persistent storage, Use for large MySQL or Postgres databases, Lustre/Hadoop/Bestman/xRootd/dCache/OrangeFS/IRODS servers. 25-Sep-2013S. Timm, OpenNebulaConf8

FermiGrid-HA2 Experience In 2009, based on operational experience and plans for redevelopment of the FCC-1 computer room, the FermiGrid-HA2 project was established to split the set of FermiGrid services across computer rooms in two separate buildings (FCC-2 and GCC-B). This project was completed on 7-Jun-2011 (and tested by a building failure less than two hours later). FermiGrid-HA2 worked exactly as designed. Our operational experience with FermiGrid-HA and FermiGrid- HA2 has shown the benefits of virtualization and service redundancy. Benefits to the user community – increased service reliability and uptime. Benefits to the service maintainers – flexible scheduling of maintenance and upgrade activities. 25-Sep-2013S. Timm, OpenNebulaConf9

Experience with FermiGrid = Drivers for FermiCloud 25-Sep-2013S. Timm, OpenNebulaConf10 Access to pools of resources using common interfaces: Monitoring, quotas, allocations, accounting, etc. Opportunistic access: Users can use the common interfaces to “burst” to additional resources to meet their needs Efficient operations: Deploy common services centrally High availability services: Flexible and resilient operations

Additional Drivers for FermiCloud Existing development and integration (AKA the FAPL cluster) facilities were: Technically obsolescent and unable to be used effectively to test and deploy the current generations of Grid middleware. The hardware was over 8 years old and was falling apart. The needs of the developers and service administrators in the Grid and Cloud Computing Department for reliable and “at scale” development and integration facilities were growing. Operational experience with FermiGrid had demonstrated that virtualization could be used to deliver production class services. 25-Sep-2013S. Timm, OpenNebulaConf11

OpenNebula OpenNebula was picked as result of evaluation of Open source cloud management software. OpenNebula 2.0 pilot system in GCC available to users since November Began with 5 nodes, gradually expanded to 13 nodes Virtual Machines run on pilot system in 3+ years. OpenNebula 3.2 production-quality system installed in FCC in June 2012 in advance of GCC total power outage—now comprises 18 nodes. Transition of virtual machines and users from ONe 2.0 pilot system to production system almost complete. In the meantime OpenNebula has done five more releases, will catch up shortly. 25-Sep-2013S. Timm, OpenNebulaConf12

FermiCloud – Fault Tolerance As we have learned from FermiGrid, having a distributed fault tolerant infrastructure is highly desirable for production operations. We are actively working on deploying the FermiCloud hardware resources in a fault tolerant infrastructure: The physical systems are split across two buildings, There is a fault tolerant network infrastructure in place that interconnects the two buildings, We have deployed SAN hardware in both buildings, We have a dual head-node configuration with HB for failover We have a GFS2 + CLVM for our multi-user filesystem and distributed SAN. SAN replicated between buildings using CLVM mirroring. GOAL: If a building is “lost”, then automatically relaunch “24x7” VMs on surviving infrastructure, then relaunch “9x5” VMs if there is sufficient remaining capacity, Perform notification (via Service-Now) when exceptions are detected. 25-Sep-2013S. Timm, OpenNebulaConf13

FCC and GCC 25-Sep-2013S. Timm, OpenNebulaConf14 FC C GC C The FCC and GCC buildings are separated by approximately 1 mile (1.6 km). FCC has UPS and Generator. GCC has UPS.

Distributed Network Core Provides Redundant Connectivity 25-Sep-2013S. Timm, OpenNebulaConf15 GCC-A Nexus 7010 Nexus 7010 Robotic Tape Libraries (4) Robotic Tape Libraries (4) Robotic Tape Libraries (3) Robotic Tape Libraries (3) Fermi Grid Fermi Grid Fermi Cloud Fermi Cloud Fermi Grid Fermi Grid Fermi Cloud Fermi Cloud Disk Servers 20 Gigabit/s L3 Routed Network 80 Gigabit/s L2 Switched Network 40 Gigabit/s L2 Switched Networks Note – Intermediate level switches and top of rack switches are not shown in the this diagram. Private Networks over dedicated fiber Grid Worker Nodes Grid Worker Nodes Nexus 7010 Nexus 7010 FCC-2 Nexus 7010 Nexus 7010 FCC-3 Nexus 7010 Nexus 7010 GCC-B Grid Worker Nodes Grid Worker Nodes Deployment completed in June 2012

Distributed Shared File System Design: Dual-port FibreChannel HBA in each node, Two Brocade SAN switches per rack, Brocades linked rack-to-rack with dark fiber, 60TB Nexsan Satabeast in FCC-3 and GCC-B, Redhat Clustering + CLVM + GFS2 used for file system, Each VM image is a file in the GFS2 file system LVM mirroring RAID 1 across buildings. Benefits: Fast Launch—almost immediate as compared to 3-4 minutes with ssh/scp, Live Migration—Can move virtual machines from one host to another for scheduled maintenance, transparent to users, Persistent data volumes—can move quickly with machines, Can relaunch virtual machines in surviving building in case of building failure/outage, 25-Sep-2013S. Timm, OpenNebulaConf16

FermiCloud – Network & SAN “Today” Private Ethernet over dedicated fiber Fibre Channel over dedicated fiber 25-Sep S. Timm, OpenNebulaConf Nexus 7010 Nexus 7010 Nexus 7010 Nexus 7010 Nexus 7010 Nexus 7010 FCC-2GCC-A FCC-3GCC-B Nexus 7010 Nexus 7010 fcl315 To fcl323 fcl315 To fcl323 FCC-3 Brocade Satabeast Brocade fcl001 To fcl013 fcl001 To fcl013 GCC-B Brocade Satabeast Brocade

FermiCloud-HA Head Node Configuration 25-Sep-2013S. Timm, OpenNebulaConf18 fcl001 (GCC-B) fcl301 (FCC-3) ONED/SCHED fcl-ganglia2 fermicloudnis2 fermicloudrepo2 fclweb2 fcl-cobbler fermicloudlog fermicloudadmin fcl-lvs2 fcl-mysql2 ONED/SCHED fcl-ganglia1 fermicloudnis1 fermicloudrepo1 fclweb1 fermicloudrsv fcl-lvs1 fcl-mysql1 2 way rsync Live Migration Multi-master MySQL CLVM/rgmanager

Cooperative R+D Agreement Partners: Grid and Cloud Computing Global Science Experimental Data hub Project Title: Integration and Commissioning of a Prototype Federated Cloud for Scientific Workflows Status: Three major work items: 1.Virtual Infrastructure Automation and Provisioning, 2.Interoperability and Federation of Cloud Resources, 3.High-Throughput Fabric Virtualization. 25-Sep-2013S. Timm, OpenNebulaConf19

Virtual Machines as Jobs OpenNebula (and all other open-source IaaS stacks) provide an emulation of Amazon EC2. HTCondor developers added code to their “Amazon EC2” universe to support the X.509-authenticated protocol. Currently testing in bulk, up to 75 VM’s OK thus far: Goal to submit NOvA workflow to FermiCloud, Notre Dame, and Amazon EC2. Smooth submission of many thousands of VM’s is key step to making the full infrastructure of a site into a science cloud. 25-Sep-2013S. Timm, OpenNebulaConf20

GWMS FACTORY USER FRONTEND FermiGrid SITE GATEWAY CDFD0CMSGPCLOUDGATE Amazon EC2 FermiCloud GCLOUD Notre Dame OpenStack VCLUSTER GRID BURSTING Vcluster reads job queue and submits VMs as needed 25-Sep-2013S. Timm, OpenNebulaConf21

22 vCluster at SC Sep-2013S. Timm, OpenNebulaConf

GWMS FACTORY USER FRONTEND Amazon EC2 FermiCloud GCLOUD Notre Dame OpenStack CLOUD BURSTING VIA GWMS 25-Sep-2013S. Timm, OpenNebulaConf23

GWMS FACTORY USER FRONTEND Amazon EC2 FermiCloud CLOUD BURSTING Via OpenNebula 25-Sep-2013S. Timm, OpenNebulaConf24

True Idle VM Detection In times of resource need, we want the ability to suspend or “shelve” idle VMs in order to free up resources for higher priority usage. This is especially important in the event of constrained resources (e.g. during building or network failure). Shelving of “9x5” and “opportunistic” VMs allows us to use FermiCloud resources for Grid worker node VMs during nights and weekends This is part of the draft economic model. Giovanni Franzini (an Italian co-op student) has written (extensible) code for an “Idle VM Probe” that can be used to detect idle virtual machines based on CPU, disk I/O and network I/O. Nick Palombo, consultant, has written the communication system and the collector system to do rule-based actions based on the idle information. 25-Sep-2013S. Timm, OpenNebulaConf25

Idle VM Information Flow 25-Sep-2013S. Timm, OpenNebulaConf26 Raw VM State DB Raw VM State DB Idle VM Collector Idle VM Collector Idle VM Logic Idle VM List Idle VM List Idle VM Trigger Idle VM Trigger Idle VM Shutdown Idle VM Management Process HOST VM Idle data store OpenNebula HOST VM Idle data store IM XMLRPC

Interoperability and Federation Driver: Global scientific collaborations such as LHC experiments will have to interoperate across facilities with heterogeneous cloud infrastructure. European efforts: EGI Cloud Federation Task Force – several institutional clouds (OpenNebula, OpenStack, StratusLab). HelixNebula—Federation of commercial cloud providers Our goals: Show proof of principle—Federation including FermiCloud + KISTI “G Cloud” + one or more commercial cloud providers + other research institution community clouds if possible. Participate in existing federations if possible. Core Competency: FermiCloud project can contribute to these cloud federations given our expertise in X.509 Authentication and Authorization, and our long experience in grid federation 25-Sep-2013S. Timm, OpenNebulaConf27

Virtual Image Formats Different clouds have different virtual machine image formats: File system ++, Partition table, LVM volumes, Kernel? We have identified the differences and written a comprehensive step by step user manual, soon to be public. 25-Sep-2013S. Timm, OpenNebulaConf28

Interoperability/Compatibility of API’s Amazon EC2 API is not open source, it is a moving target that changes frequently. Open-source emulations have various feature levels and accuracy of implementation: Compare and contrast OpenNebula, OpenStack, and commercial clouds, Identify lowest common denominator(s) that work on all. 25-Sep-2013S. Timm, OpenNebulaConf29

VM Image Distribution Investigate existing image marketplaces (HEPiX, U. of Victoria). Investigate if we need an Amazon S3-like storage/distribution method for OS images, OpenNebula doesn’t have one at present, A GridFTP “door” to the OpenNebula VM library is a possibility, this could be integrated with an automatic security scan workflow using the existing Fermilab NESSUS infrastructure. 25-Sep-2013S. Timm, OpenNebulaConf30

High-Throughput Fabric Virtualization Followed up earlier virtualized MPI work: Use it in real scientific workflows Now users can define a set of IB machines in OpenNebula on their own DAQ system simulation Large multicast activity Also experiments done with virtualized 10GBe on 100GBit WAN testbed.. 25-Sep-2013S. Timm, OpenNebulaConf31

Security Main areas of cloud security development: Secure Contextualization: Secrets such as X.509 service certificates and Kerberos keytabs are not stored in virtual machines (See following talk for more details). X.509 Authentication/Authorization: X.509 Authentication written by T. Hesselroth, code submitted to and accepted by OpenNebula, publicly available since Jan Security Policy: A security taskforce met and delivered a report to the Fermilab Computer Security Board, recommending the creation of a new Cloud Computing Environment, now in progress. We also participated in the HEPiX Virtualisation Task Force, We respectfully disagree with the recommendations regarding VM endorsement. 25-Sep-2013S. Timm, OpenNebulaConf32

Pluggable authentication Some assembly required Batteries not included 25-Sep-2013S. Timm, OpenNebulaConf33

OpenNebula Authentication OpenNebula came with “pluggable” authentication, but few plugins initially available. OpenNebula 2.0 Web services by default used access key / secret key mechanism similar to Amazon EC2. No https available. Four ways to access OpenNebula: Command line tools, Sunstone Web GUI, “ECONE” web service emulation of Amazon Restful (Query) API, OCCI web service. FermiCloud project wrote X.509-based authentication plugins: Patches to OpenNebula to support this were developed at Fermilab and submitted back to the OpenNebula project in Fall 2011 (generally available in OpenNebula V3.2 onwards). X.509 plugins available for command line and for web services authentication. 25-Sep-2013S. Timm, OpenNebulaConf34

X.509 Authentication—how it works Command line: User creates a X.509-based token using “oneuser login” command This makes a base64 hash of the user’s proxy and certificate chain, combined with a username:expiration date, signed with the user’s private key Web Services: Web services daemon contacts OpenNebula XML-RPC core on the users’ behalf, using the host certificate to sign the authentication token. Use Apache mod_proxy to pass the grid certificate DN to web services. Limitations: With Web services, one DN can map to only one user. 25-Sep-2013S. Timm, OpenNebulaConf35

Grid AuthZ Interoperability Protocol Use XACML 2.0 to specify DN, CA, Hostname, CA, FQAN, FQAN signing entity, and more. Developed in 2007, has been used in Open Science Grid and other grids Java and C bindings available for client Most commonly used C binding is LCMAPS Used to talk to GUMS, SAZ, others Allows one user to be part of different Virtual Organizations and have different groups and roles. For Cloud authorization we will configure GUMS to map back to individual user names, one per person Each personal account in OpenNebula created in advance. 25-Sep-2013S. Timm, OpenNebulaConf36

“Authorization” in OpenNebula Note: OpenNebula has pluggable “Authorization” modules as well. These control Access ACL’s—namely which user can launch a virtual machine, create a network, store an image, etc. Not related to the grid-based notion of authorization at all. Instead we make our “Authorization” additions to the Authentication routines of OpenNebula 25-Sep-2013S. Timm, OpenNebulaConf37

X.509 Authorization OpenNebula authorization plugins written in Ruby Use existing Grid routines to call to external GUMS and SAZ authorization servers Use Ruby-C binding to call C-based routines for LCMAPS or Use Ruby-Java bridge to call Java-based routines from Privilege proj. GUMS returns uid/gid, SAZ returns yes/no. Works with OpenNebula command line and non-interactive web services Much effort spent in trying to send user credentials with extended attributes into web browser Currently—ruby-java-bridge setup works for CLI For Sunstone we have shifted to have callout to VOMS done on server side. We are always interested in talking to anyone who is doing X.509 authentication in any cloud. 25-Sep-2013S. Timm, OpenNebulaConf38

Reframing Cloud Discussion Purpose of Infrastructure-as-a-service: On demand only? No—a whole new way to think about IT infrastructure both internal and external. Cloud API is just a part of rethinking IT infrastructure for data- intensive science (and MIS). Only as good as the hardware and software it’s built on. Network fabric, storage, and applications all crucial. Buy or build? Both! Will always need some in-house capacity. Performance hit? Most can be traced to badly written applications or misconfigured OS. 25-Sep-2013S. Timm, OpenNebulaConf39

FermiCloud Project Summary - 1 Science is directly and indirectly benefiting from FermiCloud: CDF, D0, Intensity Frontier, Cosmic Frontier, CMS, ATLAS, Open Science Grid,… FermiCloud operates at the forefront of delivering cloud computing capabilities to support scientific research: By starting small, developing a list of requirements, building on existing Grid knowledge and infrastructure to address those requirements, FermiCloud has managed to deliver a production class Infrastructure as a Service cloud computing capability that supports science at Fermilab. FermiCloud has provided FermiGrid with an infrastructure that has allowed us to test Grid middleware at production scale prior to deployment. The Open Science Grid software team used FermiCloud resources to support their RPM “refactoring” and is currently using it to support their ongoing middleware development/integration. 25-Sep S. Timm, OpenNebulaConf

FermiCloud Project Summary The FermiCloud collaboration with KISTI has leveraged the resources and expertise of both institutions to achieve significant benefits. vCluster has demonstrated proof of principle Grid Bursting” using FermiCloud and Amazon EC2 resources. Using SRIOV drivers on FermiCloud virtual machines, MPI performance has been demonstrated to be >96% of the native “bare metal” performance. The future is mostly cloudy. 25-Sep S. Timm, OpenNebulaConf

Acknowledgements None of this work could have been accomplished without: The excellent support from other departments of the Fermilab Computing Sector – including Computing Facilities, Site Networking, and Logistics. The excellent collaboration with the open source communities – especially Scientific Linux and OpenNebula, As well as the excellent collaboration and contributions from KISTI. And talented summer students from Illinois Institute of Technology 25-Sep-2013S. Timm, OpenNebulaConf42