Fermilab Site Report Spring 2012 HEPiX Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.

Slides:



Advertisements
Similar presentations
Fermilab, the Grid & Cloud Computing Department and the KISTI Collaboration GSDC - KISTI Workshop Jangsan-ri, Nov 4, 2011 Gabriele Garzoglio Grid & Cloud.
Advertisements

IPv6 at Fermilab Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
The FermiGrid Software Acceptance Process aka “So you want me to run your software in a production environment?” Keith Chadwick Fermilab
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Business Continuity Efforts at Fermilab Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
Idle virtual machine detection in FermiCloud Giovanni Franzini September 21, 2012 Scientific Computing Division Grid and Cloud Computing Department.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
TD TownHall Meeting January 28, Agenda Monthly Metric Review Network Upgrade Status Server Move Status Filemaker Pro Cleanup Windows 7 Migration.
The Fermilab Campus Grid (FermiGrid) Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
Outline IT Organization SciComp Update CNI Update
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Virtualization within FermiGrid Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Lower Storage projects Alexander Moibenko 02/19/2003.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
The Grid & Cloud Computing Department at Fermilab and the KISTI Collaboration Meeting with KISTI Nov 1, 2011 Gabriele Garzoglio Grid & Cloud Computing.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
April 25, 2001HEPiX/HEPNT FERMI SITE REPORT Lisa Giacchetti.
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
Farms User Meeting April Steven Timm 1 Farms Users meeting 4/27/2005
November 1, 2000HEPiX/HEPNT FERMI SITE REPORT Lisa Giacchetti.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
Fermilab Site Report HEPiX Fall 2011 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Lisa Giacchetti AFS: What is everyone doing? LISA GIACCHETTI Operating Systems Support.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
The Fermilab Campus Grid (FermiGrid) Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
FermiCloud Project Overview and Progress Keith Chadwick Grid & Cloud Computing Department Head Fermilab Work supported by the U.S. Department of Energy.
Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007.
SCD Monthly Projects Meeting 2014 Scientific Linux Update Rennie Scott January 14, 2014.
Auxiliary services Web page Secrets repository RSV Nagios Monitoring Ganglia NIS server Syslog Forward FermiCloud: A private cloud to support Fermilab.
Fermilab / FermiGrid / FermiCloud Security Update Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359 Keith Chadwick Grid.
FIFE Architecture Figures for V1.2 of document. Servers Desktops and Laptops Desktops and Laptops Off-Site Computing Off-Site Computing Interactive ComputingSoftware.
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
AFS Home Directory Migration Details Andy Romero Core Computing Division.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
FermiGrid Highly Available Grid Services Eileen Berman, Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
Computing Infrastructure – Minos 2009/12 ● Downtime schedule – 3 rd Thur monthly ● Dcache upgrades ● Condor / Grid status ● Bluearc performance – cpn lock.
Computing Infrastructure Arthur Kreymer 1 ● Power status in FCC (UPS1) ● Bluearc disk purchase – coming soon ● Planned downtimes – none ! ● Minos.
Minos Computing Infrastructure Arthur Kreymer 1 ● Grid – Going to SLF 5, doubled capacity in GPFarm ● Bluearc - performance good, expanding.
Computing Infrastructure Arthur Kreymer 1 ● Power status in FCC (UPS1) ● Bluearc disk purchase – still coming soon ● Planned downtimes – none.
f f FermiGrid – Site AuthoriZation (SAZ) Service
Presentation transcript:

Fermilab Site Report Spring 2012 HEPiX Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359

Outline Organization Changes & Calendar Migration Distributed Redundant Network Core Facilities Managed Services Scientific Linux Tape Drives, Robots & Storage HPC & Lattice QCD FermiGrid FermiCloud 23-Apr-2012Fermilab Site Report1

Organization Changes Rob Roser appointed as the Scientific Computing Division Head, Lothar Bauerdick was elected by the OSG Council to be the Open Science Grid Executive Director, Ruth Pordes is now the head of the OSG Council, 23-Apr-2012Fermilab Site Report2 Office of CIO (Deputy CIO Vacant) Office of CIO (Deputy CIO Vacant) CCD Jon Bakken CCD Jon Bakken SCD Rob Roser SCD Rob Roser Vicky White Associate Director for Computing, CIO Vicky White Associate Director for Computing, CIO

& Calendar Migration The migration to Exchange 2010 was completed on Monday 26-Mar A total of 2959 accounts were migrated from a combination of IMAP, Exchange 2007 & Lotus Notes. Calendar migration from MeetingMaker to Exchange was completed in late March. Site gateways are expected to migrate this month (or early next). 23-Apr-2012Fermilab Site Report3

Distributed Network Core Provides Redundant Connectivity 23-Apr-2012Fermilab Site Report4 GCC-A Nexus 7010 Nexus 7010 Robotic Tape Libraries (4) Robotic Tape Libraries (4) Robotic Tape Libraries (3) Robotic Tape Libraries (3) Fermi Grid Fermi Grid Fermi Cloud Fermi Cloud Fermi Grid Fermi Grid Fermi Cloud Fermi Cloud Disk Servers 20 Gigabit/s L3 Routed Network 80 Gigabit/s L2 Switched Network 40 Gigabit/s L2 Switched Networks Note – Intermediate level switches and top of rack switches are not shown in the this diagram. Private Networks over dedicated fiber Grid Worker Nodes Grid Worker Nodes Nexus 7010 Nexus 7010 FCC-2 Nexus 7010 Nexus 7010 FCC-3 Nexus 7010 Nexus 7010 GCC-B Grid Worker Nodes Grid Worker Nodes Deployment completed in January 2012

Fermilab Computing Facilities An engineering study of the GCC cooling issues encountered during 2011 has recommended the removal of the berm adjacent to GCC. The contract has been signed and the “notice to proceed” has been issued, Berm removal should be completed by ~mid-May No supplemental cooling will be deployed this year, If removal of the berm does not address the cooling issue, then the next step would be to relocate the external heat exchangers. 23-Apr-2012Fermilab Site Report5

Managed Services Fermilab has recently signed a contract with Dell to provide a set of managed computing services. The set of managed services include: Service Desk Support Personnel, “Deskside” Support Services (including PC/MAC refresh), Printer Support & Services (including printer supplies and printer refresh), Logistic services (hardware service, moves, adds, changes), Network cable installation. 23-Apr-2012Fermilab Site Report6

Scientific Linux Migration off of SL(F) 4 largely completed by 28- Feb-2012, A small number (<20) of baseline “exemptions” issued for systems to run SL(F) 4, Starting to see a (slow) increase in SL(F) 6 deployments, See Connie Sieh’s SL Update talk for more details! 23-Apr-2012Fermilab Site Report7

Tape Robots, Drives & Storage T10KC Tape drives are in full production, We found another bug in the small file accelerator microcode, and Oracle has delivered a fix that is being tested. The small file aggregation/cache for enStore was deployed on 19-Apr-2012, Pnfs -> Chimera: Was deployed on stken & enStore on 22-Feb-2012, Will be deployed for cdfen and d0en on 1-May-2012, BlueArc firmware update (6.1 -> 8.1) will be performed 1-May-2012, AFS server code update to support “compound” principals is scheduled to be deployed on 17-May-2012, 23-Apr-2012Fermilab Site Report8

HPC & Lattice QCD Existing HPC & Lattice QCD [Ds, J/Psi, Kaon, Wilson] clusters are running well, New GPU based HPC [Dsg] cluster deployed, Hardware delivered 9-Jan- 2012, Had some teething pains with infant mortal hardware and power controller microcode, Released to user community the week of 5-Mar Apr-2012Fermilab Site Report9

Current FermiGrid Statistics (as of April 2012) 23-Apr-2012Fermilab Site Report10 Cluster(s) Batch System Job Slots Raw Occupancy Effective Utilization CDF (Merged) Condor6, CMS T1Condor7, D0 (Merged)PBS8, GP GridCondor4, ––––––––– Overall- Today 27, Last Year23,

FermiGrid Overall Usage 23-Apr-2012Fermilab Site Report11

Usage by Community 23-Apr-2012Fermilab Site Report12

FermiGrid Services FermiGrid-HA2 Services deployment continues to operate well, significant events include: A major Gratia accounting service upgrade for both Fermilab and the Open Science Grid (OSG) was deployed in December 2011: Without any downtime! Latest VOMS/VOMS-Admin was deployed on 28-Mar-2012: We generated an unscheduled 8 hour outage of our (old) production VOMS servers while deploying changes to support the new production VOMS servers, We have identified several issues with the new production VOMS software and are in contact with the developers (the fixes are promised soon…). The current OSG Grid User Mapping Service (GUMS) also has an issue with the new VOMS (the GUMS developers have promised a fix soon – it is currently in testing). Our MyProxy deployment has had a few problems over the past six months: Triggered by incompatible SL(F)/DRBD yum upgrades taking down both the primary and secondary copy, We have revised our MyProxy upgrade procedures, and have deployed extra MyProxy monitoring. 23-Apr-2012Fermilab Site Report13

FermiGrid Service Availability (measured over the past year) 23-Apr-2012Fermilab Site Report14 Service Raw Availability HA Configuration Measured HA Availability Minutes of Downtime VOMS – VO Management Service %Active-Active99.908%480 GUMS – Grid User Mapping Service %Active-Active %0 SAZ – Site AuthoriZation Service %Active-Active %0 Squid – Web Cache99.663%Active-Active %0 MyProxy – Grid Proxy Service99.374%Active-Standby99.749%1,320 ReSS – Resource Selection Service %Active-Active %0 Gratia – Fermilab and OSG Accounting %Active-Standby %0 MySQL Database99.785%Active-Active %0

FermiGrid “Core” Service Metrics (measured over the past year) 23-Apr-2012Fermilab Site Report15 Service Calls per Hour Average / Peak Calls per Day Average / Peak VOMS – VO Management Service 87 / K / 19K GUMS – Grid User Mapping Service 17.3K / 114.1K415K / 1.25M SAZ – Site AuthoriZation Service 14.6K / 150.3K350K / 1.23M Squid – Web Cache -not measured-8.24M / 92M MyProxy – Grid Proxy Service 867 / 8.5K18.1K / 83.7K

FermiCloud We are in the process of deploying distributed & replicated SAN hardware, together with the software deployments to allow VM live migration between buildings, We have some exciting results regarding cloud accounting and virtualized MPI, More in my “FermiCloud” update talk later this week. 23-Apr-2012Fermilab Site Report16

Openings! Fermilab has a number of positions open: System Administrators, Computing Services Specialists, −CSS III in the Grid and Cloud Computing Department Database Administrators, If you are interested, please visit: 23-Apr-2012Fermilab Site Report17

Thank You! Any Questions? 23-Apr Fermilab Site Report