2011-01--06Minos Computing Infrastructure Arthur Kreymer 1 ● Grid – Going to SLF 5, doubled capacity in GPFarm ● Bluearc - performance good, expanding.

Slides:



Advertisements
Similar presentations
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Advertisements

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
Minerva Infrastructure Meeting – October 04, 2011.
Jun 29, 20101/25 Storage Evaluation on FG, FC, and GPCF Jun 29, 2010 Gabriele Garzoglio Computing Division, Fermilab Overview Introduction Lustre Evaluation:
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
DBS to DBSi 5.0 Environment Strategy Quinn March 22, 2011.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
Workshop on Computing for Neutrino Experiments - Summary April 24, 2009 Lee Lueking, Heidi Schellman NOvA Collaboration Meeting.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Fermilab Site Report Spring 2012 HEPiX Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
1 J. Keller, R. Naues: A Collaborative Virtual Computer Security Lab Amsterdam,Dec 4, 2006 Amsterdam, DEC 4, 2006 Jörg Keller FernUniversität in Hagen,
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Offline Operations Aidan Robson (for the offline group) CDF Weekly, 6 August 2010 Calibrations: Calibration tables are being finalised for p30 (13 April.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Scientific Computing in PPD and other odds and ends Chris Brew.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007.
2-December Offline Report Matthias Schröder Topics: Monte Carlo Production New Linux Version Tape Handling Desktop Computers.
II EGEE conference Den Haag November, ROC-CIC status in Italy
FIFE Architecture Figures for V1.2 of document. Servers Desktops and Laptops Desktops and Laptops Off-Site Computing Off-Site Computing Interactive ComputingSoftware.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
CD FY10 Budget and Tactical Plan Review FY10 Tactical Plans for Scientific Computing Facilities / General Physics Computing Facility (GPCF) Stu Fuess 06-Oct-2009.
CD FY10 Budget and Tactical Plan Review FY10 Tactical Plan for SCF / System Administration DocDB #3389 Jason Allen 10/06/2009.
GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
SAM projects status Robert Illingworth 29 August 2012.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Minos Computing Infrastructure Arthur Kreymer 1 ● Young Minos issues: – /local/stage1/minosgli temporary reprieve – need grid shared area – Retiring.
Computing Infrastructure – Minos 2009/06 ● MySQL and CVS upgrades ● Hardware deployment ● Condor / Grid status ● /minos/data file server ● Parrot status.
E835 Computing at FNAL fn835 systems are old running older operating systems fn835 administration –Time to maintain –Security issues –Thanks to Gabriele.
Computing Infrastructure – Minos 2009/12 ● Downtime schedule – 3 rd Thur monthly ● Dcache upgrades ● Condor / Grid status ● Bluearc performance – cpn lock.
Computing Infrastructure Arthur Kreymer 1 ● Power status in FCC (UPS1) ● Bluearc disk purchase – coming soon ● Planned downtimes – none ! ● Minos.
Minos Computing Infrastructure Arthur Kreymer 1 ● Young Minos issues: – /local/stage1/minosgli temporary reprieve – need grid shared area – Retiring.
Computing Infrastructure Arthur Kreymer 1 ● Power status in FCC (UPS1) ● Bluearc disk purchase – still coming soon ● Planned downtimes – none.
Compute and Storage For the Farm at Jlab
SQL Replication for RCSQL 4.5
Virtualization and Clouds ATLAS position
U.S. ATLAS Grid Production Experience
Building a Virtual Infrastructure
Provisioning 160,000 cores with HEPCloud at SC17
Статус ГРИД-кластера ИЯФ СО РАН.
Artem Trunov and EKP team EPK – Uni Karlsruhe
PES Lessons learned from large scale LSF scalability tests
Virtualization in the gLite Grid Middleware software process
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
OffLine Physics Computing
Haiyan Meng and Douglas Thain
Privilege Separation in Condor
Presentation transcript:

Minos Computing Infrastructure Arthur Kreymer 1 ● Grid – Going to SLF 5, doubled capacity in GPFarm ● Bluearc - performance good, expanding and consolidating Jan 20 ● Enstore – have admin access, changed 2011raw data owner to minosraw ● Nucomp – Soudan network upgraded, jobsub reunification underway ● GPCF – have minos-slf4 minos-slf5 ● Servers – stable, ongoing ganglia and aklog issues ● Control Room – remote shifts ● NDTF – Near Detector Task Force ( Minerva / Minos ) INFRASTRUCTURE

Minos Computing Infrastructure Arthur Kreymer 2 ● Young Minos issues: – Getting minos54 back from NOvA – scaling to 5000 Grid jobs – will ramp up after Condor 7.4 upgrade ● Raw Data file ownership – 2011 changed from buckley to minosraw, will do the rest soon ● Bluearc issues – Moving to new disks, 170 TB capacity ( was 130 ) Jan 20 – Initial copies are done, touch ups run in under 2 hours – This removes Satabeast from Bluearc – move to Enstore or DCache – Will keep /minos/data2 symlink to /minos/data, but stop using it – Hoping to triple disk/$ ratio via Bluearc Data Migration – Most of 10 TB of mcimport/STAGE is archived, thanks Adam ! ● Scheduled downtime 2011 Jan 20 all day for kernels, PNFS, Fermigrid, etc ● Soudan network upgrade is complete ● Planning for Condor 7.4 upgrade and jobsub script Reunification with IF EXECUTIVE SUMMARY

Minos Computing Infrastructure Arthur Kreymer 3 Computing Infrastructure – Minos 2009/12 ● We continue to reach peaks of 2000 running jobs, but quiet recently – still want 5000, will ask scale up when we have Condor 7.4 ● Still using wrapper for condor_rm, condor_hold – To prevent SAZ authentication overload – Condor 7.4 upgrade will fix this internally, soonish ● CPN million locks served to Minos ● Final Fermigrid workers are moving to SLF 5.4 Jan 20 – There will be no more SLF 4 Fermigrid workers GRID

Minos Computing Infrastructure Arthur Kreymer 4 Bluearc ● /minos/data expansion 2011 Jan 20 – Copying files from /minos/app, minos/data, minos/data2 to new disk – Net gain of at least 30 TB – Retiring the original Satabease ( data, app ) from Bluearc – New mounts will be /minos/data and app ● temporary data2 and scratch compatibility symlinks ● Performance remains good

Minos Computing Infrastructure Arthur Kreymer 5 Enstore ● Changed owner of raw data from buckley to minosraw in 2011 ● Will change all existing file ownerships this month ● New LTO4GS robot in GCC for non-CMS customers. – Existing tapes being moved there. – Will keep raw data Vault copies separate

Minos Computing Infrastructure Arthur Kreymer 6 NUCOMP ● GPCF hardware is installed and running. ● Soudan networking upgrade hardware was installed ● 600 cores of new Fermigrid processing is deployed, 1/3 for Minos ● Dennis Box is working on a shared jobsub to replace minos_jobsub ● REX will take over Condor supportfrom Ryan Patterson soon ( at 7.4 ) ● Next meeting around Jan 19 ? Not yet scheduled.

Minos Computing Infrastructure Arthur Kreymer 7 GPCF ● Initial configuration for GPCF – 14 batch hosts x 16 slots – /scratch/minos is 10 TB NFS served – Interactive systems are Oracle VM virtual machines ● gpsn01 – shared Grid submission node is available ● minos-slf5 = minosgpvm01 runs SLF 5.4, give it a try ! ● minos-slf4 = minosgpvm02 runs SLF bit kernel ● nova01/2 virtual intractive systems having I/O performance issues – Bluearc data was originally under 10 MB/sec, still under 30 – being investigated

Minos Computing Infrastructure Arthur Kreymer 8 Minos Servers ● minos26 will be powered off soon – use minos-slf4 for access to a32bit SLF4 ● minos54 back from Nova Jan 20 ? - INC ● Ganglia – still not available on ifora1/2, not supported by FEF ● Need aklog for SLF 5, this is coming soon – aklog has been ported, Nagy is integrating this with kinit – waiting for an RPM to test ● SLF5 plans – will move to retire SLF 4 when we have stable kcron/aklog – Control Room already runs SLF 5 ● Will request management ability to kill non-root processes on Cluster

Minos Computing Infrastructure Arthur Kreymer 9 Control Room ● First official remote shifts from BNL on 2011 Jan 3-6 – Great effort by Brett and Mary especially. – Ricardo Gomes will be at Fermilab later in January to work on this. – Documentation will be coming – We should generate project principals for each remote control room. ● Minerva needs to run remote shifts this spring. ● I'd like to see a web-based Big Green Button ● I'd like to switch the ACNET consoles to SSH tunnels ( AD supports this )

Minos Computing Infrastructure Arthur Kreymer 10 Near Detector Task Force ● Planning handover of the Near Detector to Minerva by 2012 ● I drafted a document for offline and Control Room issues. ● Minerva will need to take over – Calibration – Batch processing – Data Quality – Data Handling – Simulation – Code maintenance – and more – we are just getting started ● Task force met with the Directorate 2010 Dec 2 ● See DocDB Minerva 5600, Minos 7787 ● Minerva is counting on doing remote shifts when they are in charge.