Batch Software at JLAB Ian Bird Jefferson Lab CHEP2000 7-11 February, 2000.

Slides:



Advertisements
Similar presentations
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Advertisements

Jefferson Lab and the Portable Batch System Walt Akers High Performance Computing Group.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
3rd Nov 2000HEPiX/HEPNT CDF-UK MINI-GRID Ian McArthur Oxford University, Physics Department
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Disk Farms at Jefferson Lab Bryan Hess
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Hall D Computing Facilities Ian Bird 16 March 2001.
Compute and Storage For the Farm at Jlab
CASTOR: possible evolution into the LHC era
WP18, High-speed data recording Krzysztof Wrona, European XFEL
PC Farms & Central Data Recording
Experiences with Large Data Sets
The COMPASS event store in 2002
LQCD Computing Operations
OffLine Physics Computing
Grid Canada Testbed using HEP applications
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Wide Area Workload Management Work Package DATAGRID project
Lee Lueking D0RACE January 17, 2002
Presentation transcript:

Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000

Ian Bird / Jefferson LabCHEP Introduction Environment –Farms –Data flows –Software Batch systems –JLAB software –LSF vs. PBS Scheduler Tape software –File pre-staging/caching

Ian Bird / Jefferson LabCHEP Environment Computing facilities were designed to: –Handle data rate of close to 1 TB/day –1 st level reconstruction only (2 passes) Match average data rate –Some local analysis but mainly export of vastly reduced summary DSTs Originally estimated requirements: –~ 1000 SI95 –3 TB online disk –300 TB tape storage – 8 RedWood drives

Ian Bird / Jefferson LabCHEP Environment - real After 1 year of production running of CLAS (largest experiment) –Detector is far cleaner than anticipated, which means: Data volume is less ~ 500 GB/day Data rate is 2.5x anticipated (2.5 kHz) Fraction of good events larger DST sizes are same as Raw data (!) –Per event processing time is much longer than original estimates –Most analysis is done locally – no-one is really interested in huge data exports Other experiments also have large data rates (for short periods)

Ian Bird / Jefferson LabCHEP Computing implications CPU requirement is far greater –Current farm is 2650 SI95 and will double this year Farm has a big mixture of work –Not all production – “small” analysis jobs too –We make heavy use of LSF hierarchical scheduling Data access demands are enormous –DSTs are huge, many people, frequent accesses –Analysis jobs want many files Tape access became a bottleneck –Farm can no longer be satisfied

JLab Farm Layout Gigabit Ethernet Quad SUN E4000 STK Redwood Tape Drives Fast Ethernet Gigabit Ethernet SCSI2 FWD SCSI2 UWD/S Fast Ethernet Dual PII 450MHz Qty GB UWS Dual PII 400MHz Qty GB UWS Cisco Cat 5500 Quad SUN E GB stage 150GB 200GB stage MetaStor SH7400 File Server 3TB UWD work MetaStor SH7400 File Server 3TB UWD work Plan - FY 2000 STK 9840 Tape Drives FARM SYSTEMS MASS STORAGE MASS STORAGESERVERS WORK FILE SERVERS WORK FILE SERVERS Cisco 2900 Gigabit Ethernet Cisco 2900 Dual PIII 500MHz Qty GB UWS Dual PIII 650MHz Qty GB UWS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD CACHE FILE SERVERS CACHE FILE SERVERS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD Cisco 2900 Dual PII 300MHz Qty GB FWD

Ian Bird / Jefferson LabCHEP Other farms Batch farm –180 nodes -> 250 Lattice QCD –20 node Alpha (Linux) cluster –Parallel application development –Plans (proposal) for large 256 node cluster Part of larger collaboration Group want a “meta-facility” –Jobs run on least loaded cluster (wide area scheduling)

Ian Bird / Jefferson LabCHEP Additional requirements Ability to handle and schedule parallel jobs (MPI) Allow collaborators to “clone” the batch systems and software –Allow inter-site job submission –LQCD is particularly interested in this Remote data access

Ian Bird / Jefferson LabCHEP Components Batch software –Interface to underlying batch system Tape software –Interface to OSM, overcome limitations Data caching strategies –Tape staging –Data caching –File servers

Ian Bird / Jefferson LabCHEP Batch software A layer over the batch management system –Allow replacement of batch system LSF, PBS (DQS) –Constant user interface no matter what the underlying system is –Batch farm can be managed by the management system (e.g. LSF) –Build in a security infrastructure (e.g GSI) Particularly to allow remote access securely

Batch control system LSF, PBS, DQS, etc. Job submission system Submission interface Database Query interface User processes Submission, query, statistics Batch processors Batch system - schematic

Ian Bird / Jefferson LabCHEP Existing batch software Has been running for 2 years –Uses LSF –Multiple jobs – parameterized jobs (LSF now has job arrays, PBS does not have this) –Client is trivial to install on any machine with a JRE – no need to install LSF, PBS etc. Eases licensing issues Simple software distribution Remote access –Standardized statistics and bookkeeping outside of LSF MySQL based

Ian Bird / Jefferson LabCHEP Existing software cont. Farm can be managed by LSF –Queues, hosts, scheduler etc. Rewrite in progress to: –Add PBS interface (and DQS?) –Security infrastructure to permit authenticated remote access –Clean up

Ian Bird / Jefferson LabCHEP PBS as alternative to LSF PBS (Portable Batch System – NASA) –Actively developed –Open, freely available –Handles MPI (PVM) –User interface very familiar to NQS/DQS users –Problem (for us) was the (lack of a good) scheduler PBS provides only a trivial scheduler, but Provides mechanism to plug in another We were using hierarchical scheduling in LSF

Ian Bird / Jefferson LabCHEP PBS scheduler Multiple stages (6), can be used or not as required, in arbitrary order –Match making – matches requirements to system resources –System priority (e.g. data available) –Queue selection (which queue runs next) –User priority –User share: which user runs next, based on user and group allocations and usage –Job age Scheduler has been provided to PBS developers for comments – and is under test

Ian Bird / Jefferson LabCHEP Mass storage Silo – 300 TB Redwood capacity –8 Redwood drives –5 (+5) 9840 drives –Managed by OSM Bottleneck: –Limited to a single data mover –That node has no capacity for more drives –1 TB tape staging RAID disk 5 TB of NFS work areas/caching space

Ian Bird / Jefferson LabCHEP Solving tape access problems Add new drives – 9840’s –Requires 2 nd OSM instance Transparent to user Eventual replacement of OSM Transparent to user File pre-staging to the farm Distributed data caching (not NFS) Tools to allow user optimization Charge for (prioritize) mounts

Ian Bird / Jefferson LabCHEP OSM OSM has several limitations (and is no longer supported) –Single mover node is most serious No replacement possible yet Local tapeserver software solves many of these problems for us –Simple remote clients (Java based) – do not need OSM except on server

Ian Bird / Jefferson LabCHEP Tape access software Simple put/get interface, –Handles multiple files, directories etc. Can have several OSM instances, but a unique file catalog, transparent to user –System fails over between servers Only way to bring 9840’s on line Data transfer is network (socket) copy in Java Allows a scheduling/user allocation algorithm to be added to tape access Will permit “transparent” replacement of OSM

Ian Bird / Jefferson LabCHEP Data pre-fetching & caching Currently –Tape – stage disk – network copy to farm node local disk –Tape – stage disk – NFS cache – farm But this can cause NFS server problems Plan: –Dual solaris nodes with ~ 350 GB disk (RAID 0) Gigabit ethernet Provides large cache for farm input –Stage out entire tapes to cache Cheaper than staging space, better performance than NSF Scaleable as the farm grows

JLab Farm Layout Gigabit Ethernet Quad SUN E4000 STK Redwood Tape Drives Fast Ethernet Gigabit Ethernet SCSI2 FWD SCSI2 UWD/S Fast Ethernet Dual PII 450MHz Qty GB UWS Dual PII 400MHz Qty GB UWS Cisco Cat 5500 Quad SUN E GB stage 150GB 200GB stage MetaStor SH7400 File Server 3TB UWD work MetaStor SH7400 File Server 3TB UWD work Plan - FY 2000 STK 9840 Tape Drives FARM SYSTEMS MASS STORAGE MASS STORAGESERVERS WORK FILE SERVERS WORK FILE SERVERS Cisco 2900 Gigabit Ethernet Cisco 2900 Dual PIII 500MHz Qty GB UWS Dual PIII 650MHz Qty GB UWS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD CACHE FILE SERVERS CACHE FILE SERVERS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD Cisco 2900 Dual PII 300MHz Qty GB FWD

Ian Bird / Jefferson LabCHEP File pre-staging Scheduling for pre-staging is done by the job server software –Splits/groups jobs by tape (could be done by user) –Makes a single tape request –Holds jobs while files are staged –Implemented by batch jobs that release held jobs –Released jobs with data available get high priority –Reduces job slots blocked by jobs waiting for data

Ian Bird / Jefferson LabCHEP Conclusions PBS is a sophisticated and viable alternative to LSF Interface layer permits –use of same jobs on different systems – user migration –Add features to batch system