FermiCloud Review Response to Questions Keith Chadwick Steve Timm Gabriele Garzoglio Work supported by the U.S. Department of Energy under contract No.

Slides:



Advertisements
Similar presentations
PRAGMA Application (GridFMO) on OSG/FermiGrid Neha Sharma (on behalf of FermiGrid group) Fermilab Work supported by the U.S. Department of Energy under.
Advertisements

Fermilab, the Grid & Cloud Computing Department and the KISTI Collaboration GSDC - KISTI Workshop Jangsan-ri, Nov 4, 2011 Gabriele Garzoglio Grid & Cloud.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
Idle virtual machine detection in FermiCloud Giovanni Franzini September 21, 2012 Scientific Computing Division Grid and Cloud Computing Department.
Assessment of Core Services provided to USLHC by OSG.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
OSG Public Storage and iRODS
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
DISTRIBUTED COMPUTING
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
CloudNaaS: A Cloud Networking Platform for Enterprise Applications Theophilus Benson*, Aditya Akella*, Anees Shaikh +, Sambit Sahu + (*University of Wisconsin,
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Virtualization within FermiGrid Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
INFSO-RI Enabling Grids for E-sciencE VO BOX Summary Conclusions from Joint OSG and EGEE Operations Workshop - 3 Abingdon, 27 -
The Grid & Cloud Computing Department at Fermilab and the KISTI Collaboration Meeting with KISTI Nov 1, 2011 Gabriele Garzoglio Grid & Cloud Computing.
Fermilab Site Report Spring 2012 HEPiX Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Full and Para Virtualization
BlueArc IOZone Root Benchmark How well do VM clients perform vs. Bare Metal clients? Bare Metal Reads are (~10%) faster than VM Reads. Bare Metal Writes.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
High Throughput Data Program (HTDP) at FNAL Mission: investigate the impact of and provide solutions for the scientific computing challenges in Big Data.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
FermiCloud Project Overview and Progress Keith Chadwick Grid & Cloud Computing Department Head Fermilab Work supported by the U.S. Department of Energy.
Auxiliary services Web page Secrets repository RSV Nagios Monitoring Ganglia NIS server Syslog Forward FermiCloud: A private cloud to support Fermilab.
Fermilab / FermiGrid / FermiCloud Security Update Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359 Keith Chadwick Grid.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Authentication, Authorization, and Contextualization in FermiCloud S. Timm, D. Yocum, F. Lowe, K. Chadwick, G. Garzoglio, D. Strain, D. Dykstra, T. Hesselroth.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
FIFE Architecture Figures for V1.2 of document. Servers Desktops and Laptops Desktops and Laptops Off-Site Computing Off-Site Computing Interactive ComputingSoftware.
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Fabric for Frontier Experiments at Fermilab Gabriele Garzoglio Grid and Cloud Services Department, Scientific Computing Division, Fermilab ISGC – Thu,
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
Hao Wu, Shangping Ren, Gabriele Garzoglio, Steven Timm, Gerard Bernabeu, Hyun Woo Kim, Keith Chadwick, Seo-Young Noh A Reference Model for Virtual Machine.
CD FY10 Budget and Tactical Plan Review FY10 Tactical Plans for Scientific Computing Facilities / General Physics Computing Facility (GPCF) Stu Fuess 06-Oct-2009.
GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
Compute and Storage For the Farm at Jlab
Bob Jones EGEE Technical Director
Virtualization.
Chapter 6: Securing the Cloud
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
f f FermiGrid – Site AuthoriZation (SAZ) Service
Introduction to Data Management in EGI
Cloud Management Mechanisms
University of Technology
Managing Services with VMM and App Controller
Presentation transcript:

FermiCloud Review Response to Questions Keith Chadwick Steve Timm Gabriele Garzoglio Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359

Red Pill or Blue Pill? 6-Feb-2013FermiCloud Review - Response to Questions1

Question 1 Please provide a list of use cases – users, applications, required capabilities and capacities - for the immediate plans to move into production and projected as supportable with the existing resources. What additional resources might be needed in the next year based on other known use cases/users. 6-Feb-2013FermiCloud Review - Response to Questions2

Current Production Summary 6-Feb-2013FermiCloud Review - Response to Questions3 User VM (by SLA)VM Quantity Server VM24x712 Persistent Integration VM9x542 Test VMOppor.129 User VM Sub Total183 Mgmt VM (by SLA)VM Quantity Auxiliary Services24x77 that are required to9x510 manage FermiCloudOppor.0 (nagios, ganglia, mysql, lvs, etc.)Mgmt VM Sub Total17 VM Grand Total % subscription levelExisting Fermicloud384

Server VM’s dzeroJ. Boyd2 4 cores 120GB storage SAMGRID Forwarding nodes for LCG Submission IFVotava91 core Intensity Frontier GridFTP servers (9 currently, 2 more requested) geant4Wenzel11 coreGEANT4 Validation server 6-Feb-2013FermiCloud Review - Response to Questions4

Persistent Integration VMs geant4Wenzel1Development GEANT4 Validation Server IRODSLevshina2IRODS server for OSG IRODS users minosTagg1MINOS event display test FGSTimm1ITB Storage Element--FermiGrid ITB site FGSTimm7 FermiGrid Services Stress Testing (SAZ,GUMS, MySQL) Xen FGSTimm2FermiGrid gums stress test FGSTimm2FermiGrid KVM-based MySQL stress test FGSSharma6dCache test stand/OSG SW FGSTimm, Sharma9SHA-2 testing DOCSDykstra1Extenci project, test Lustre over WAN DOCSLevshina1VOMRS testing DOCSDykstra1gateway to 100GB ANI testbed 6-Feb-2013FermiCloud Review - Response to Questions5

Test VM’s Big categories: OSG Software Team, GlideinWMS Project, Gratia development/integration, CVMFS testing, FermiGrid “at scale” testing of Grid middleware (GUMS, SAZ, etc.). 6-Feb-2013FermiCloud Review - Response to Questions6

(Known) Big Upcoming Use Cases KISTI joint project, grid bursting tests: virtual machines. Mike Wang NFS v4.1 testing: He would like 25+ VM’s. SHA2/IPv6 testing—could need 20+ VM’s: Can accommodate on existing hardware. Would be nice to have idle VM detection/reclamation feature working. 6-Feb-2013FermiCloud Review - Response to Questions7

Other Good Potential Use Cases Possible MPI Use case: G. Lukhanin started Nova DAQ simulation in spring 2012, didn’t finish, MPI was used as isolated private net for heavy multicast activity. DMS Dept. dCache testing Were stakeholders of old FAPL cluster, Haven’t had chance to start FermiCloud work. 6-Feb-2013FermiCloud Review - Response to Questions8

Run II Data Preservation Cloud technology offers possibility for Legacy/unpatched OS, Dedicated private net (software defined network), Hosting database servers, Compute servers which are only booted on demand. FermiCloud would need new HW buy to absorb this capacity. 6-Feb-2013FermiCloud Review - Response to Questions9

Data-intensive Science DES & Darkside50: High I/O per compute instruction, Large data sets. These could be addressed in cloud-like configuration but we expect that we would need more hardware, particularly closely attached high performance storage. 6-Feb-2013FermiCloud Review - Response to Questions10

Budget In the current FY2013 budget request, we have requested $28K + $90K (total $118K) funding for additional FermiCloud “host” systems, locations TBD. FCC-3, FCC-2, GCC-A, or LCC are all possible locations. We also have another $64K in the FY2013 budget request that was targeted for GP Grid Worker nodes that could be reprogramed. It is likely that similar expansion could be needed in FY2014, with possibilities for significant additional expansion depending on stakeholder requirements. In FY2015, the first set of 23 systems will reach 5 years in service and are likely candidates for retirement. As previously said, new stakeholders such as DES, Darkside, or Run II data preservation may require additional hardware acquisitions, although if we know that they are coming, we can work with them to assure that our planned hardware acquisitions can address at least some of their needs. 6-Feb-2013FermiCloud Review - Response to Questions11

Question 2 We would like to learn the process used to choose the list of the development features being proposed and how the prioritization is done based on the use case or other drivers. 6-Feb-2013FermiCloud Review - Response to Questions12

List of Development Features The list of development features is determined as a combination of the following constraints and input: Compliance to the policies of Fermilab, The judgment of the FermiCloud Project Management Team based on their multiple years of supporting scientific computing, Input from our collaborators, users and stakeholders. Currently, this is gathered at the weekly project management meetings, where potential development topics and priorities are discussed [Examples – SAN, InfiniBand, 100G, authorization]. We believe that a (?monthly?) dedicated stakeholder forum separate from the project management meetings would serve this need more inclusively. Based on this collective thinking, we believe that certain capabilities are going to be expected by our users [Example - Cloud bursting] 6-Feb-2013FermiCloud Review - Response to Questions13

Prioritization of Development Items The priority is determined as a combination of the following constraints and input: Compliance to the security policies of the open science and general computing environments. [Example: x509 authentication], Operational needs of the administrative team [Example: development of the X509 AuthZ call-out module to allow the central management of privileges; resource accounting; resource optimization via idle VM detection to prioritize VM survival through building downtimes and to implement off- hours Grid-bursting ], Input from scientific stakeholders, Priorities of the major collaborators - negotiations on what makes a collaboration possible / successful [Example: exchanging workloads with KISTI through a Cloud Federation], In addition, we want to develop capabilities that place ourselves strategically on certain high-profile initiatives, such as data preservation [Example - accepting VMs as jobs to retain computational environment without maintaining a central VM repository of all supported stakeholders]; Finally, our priorities are informed by the interactions with colleagues, program managers, task forces, standardization bodies, etc. 6-Feb-2013FermiCloud Review - Response to Questions14

Development vs. Operational Effort The GCC Department is already organized into distinct operations and development groups: Operations – FGS Development – DOCS Members of both groups work very closely with the other groups to deliver solutions to our stakeholders. The GCC Department does have a couple of very capable “switch hitters” (Neha Sharma and Hyunwoo Kim) that have shown that they can support both development and operations. Later this month, we will lose Doug Strain (he has taken a position with Google), so we do have an opportunity to slightly rebalance the department personnel. At the present time we are opting to propose that the replacement has more exposure to Cloud computing (Doug’s effort was a split across OSG Storage and GlideinWMS), We could consider recasting this replacement be tasked towards operations. 6-Feb-2013FermiCloud Review - Response to Questions15

Future FermiCloud Hardware >=64 cores, >=192 Gbytes of memory, FibreChannel HBA, Raid card, with >=8 high speed disks, Possible 10 Gb/s interface, 2U chassis. 6-Feb-2013FermiCloud Review - Response to Questions16

Summary Today, the FermiCloud project is driven by the “best judgment” of the Grid & Cloud Computing Department Management, coupled with those stakeholders that attend the weekly FermiCloud project meeting. Allowing the FermiCloud project to formally engage the set of FermiCloud Stakeholders to collect requests and recommendations would greatly improve this state of affairs. If we are going to be part of the Run II Data Preservation efforts, they will expect (and we would agree to) significant input on our future plans 6-Feb-2013FermiCloud Review - Response to Questions17

"Give me a place to stand, and I will move the Earth.” - Archimedes 6-Feb-2013FermiCloud Review - Response to Questions18

Cast of Characters The Earth – Science The Lever – Virtualization The Clouds – Cloud Computing Archimedes – Fermilab Scientists The Fulcrum – FermiCloud and the GCC Department 6-Feb-2013FermiCloud Review - Response to Questions19

Thank You Any Questions?