Update on SixDesk Scripts

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
TUPEC057 Advances With Merlin – A Beam Tracking Code J. Molson, R.J. Barlow, H.L. Owen, A. Toader MERLIN is a.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Processes CSCI 444/544 Operating Systems Fall 2008.
Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Chapter 8 Operating Systems and Utility Programs By: James Granahan.
What is Unix Prepared by Dr. Bahjat Qazzaz. What is Unix UNIX is a computer operating system. An operating system is the program that – controls all the.
Systems Life Cycle A summary of what needs to be done.
Lesson 4 Computer Software
Configuring the MagicInfo Pro Display
GRD - Collimation Simulation with SIXTRACK - MIB WG - October 2005 LHC COLLIMATION SYSTEM STUDIES USING SIXTRACK Ralph Assmann, Stefano Redaelli, Guillaume.
Tutorial 11 Installing, Updating, and Configuring Software
1 port BOSS on Wenjing Wu (IHEP-CC)
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
IOS110 Introduction to Operating Systems using Windows Session 9 1.
SixTrack Status Speaker: R. De Maria Special thanks to: F. Schmidt, E. McIntosh, volunteers. Developers: A. Aviral, J. Barranco, D. Banfi, L.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
The Functions of Operating Systems Desktop PC Operating Systems.
The EDGeS project receives Community research funding 1 Porting Applications to the EDGeS Infrastructure A comparison of the available methods, APIs, and.
Running Kuali: A Technical Perspective Ailish Byrne (Indiana University) Jonathan Keller (University of California, Davis)
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
SAP R/3 User Administration1. 2 User administration in a productive environment is an ongoing process of creating, deleting, changing, and monitoring.
OPERATING SYSTEM REVIEW. System Software The programs that control and maintain the operation of the computer and its devices The two parts of system.
Operating System Structure Lecture: - Operating System Concepts Lecturer: - Pooja Sharma Computer Science Department, Punjabi University, Patiala.
GLAST LAT ProjectNovember 18, 2004 I&T Two Tower IRR 1 GLAST Large Area Telescope: Integration and Test Two Tower Integration Readiness Review SVAC Elliott.
Software Project Configuration Management
SixDesk Scripts – an Update
Operating System Review
OpenPBS – Distributed Workload Management System
PSCAD models.
Recent Improvements in Collsoft
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Operating System.
AFS/LSF Phase Out personal experience
Overview of Needs for SixTrack on-line Aperture Check
IW2D migration to HTCondor
Computer Software.
Overview – SOE PatchTT December 2013.
Multi-Turn Extraction studies and PTC
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
LCGAA nightlies infrastructure
Field quality to achieve the required lifetime goals (single beam)
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Operating System Review
UPDATE ON DYNAMIC APERTURE SIMULATIONS
Operating System Review
Module 01 ETICS Overview ETICS Online Tutorials
Lecture Topics: 11/1 General Operating System Concepts Processes
Michael P. McCumber Task Force Meeting April 3, 2006
Chapter 2: Operating-System Structures
Why Background Processing?
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Chapter 2: Operating-System Structures
Presentation transcript:

Update on SixDesk Scripts R. De Maria, M. Giovannozzi, E. Mcintosh, A. Mereghetti, I. Zacharov 5th Sep 2016 A.Mereghetti

Outline SixDesk; BOINC; Motivations for update; Updates currently implemented; Conclusions and outlook; 5th Sep 2016 A.Mereghetti

SixDesk (R. De Maria, E. Mcintosh, F. Schmidt) run environment of SixTrack for DA studies; Scripts for pre-/post-processing: from MadX .mask files to final plots; Structured user space, containing scripts and templates; Collection of standard .mask files; Main parameter scan: Amplitude/angle in x-y plane; Qx/Qy; WISE errors (magnetic errors); Studies imply massive tracking campaigns: High CPU demands: 100k / 1M turns through LHC / HL-LHC / FCC lattices; Low I/O per simulation: input files generally small; output basically limited to final coordinates of particles in phase space; Massive number of jobs: a single job allows to obtain results on few points in the phase/parameter space;  These characteristics make DA studies suitable for being run not only in LSF but also in BOINC. 5th Sep 2016 A.Mereghetti

BOINC: Berkley Open Infrastructure for Network Computing System which uses the idle time of private computers to make CPUs available to scientists. Computing nodes are typically not those located in a lab / university, but private home’s / offices’ desktops! Work flow: Submission: a user submit a set of WUs to the BOINC server, which distributes them (with the exe) to volunteers’ PCs; Computing: volunteers PCs perform the calculations; at the end, they give back results to the BOINC server; Returning results: the BOINC server performs a validation (math stability) and gives results back to user; BOINC server users volunteers 5th Sep 2016 A.Mereghetti

BOINC: Berkley Open Infrastructure for Network Computing (II) The BOINC server takes responsibility for the flow of WUs: User’s side: upload/download to/from a shared AFS volume (spooldir); Volunteer’s side: upload/download area; Internal DB, for keeping track of Wus (e.g. submitted/done/re-sent); BOINC server users volunteers 5th Sep 2016 A.Mereghetti

BOINC: Berkley Open Infrastructure for Network Computing (III) The BOINC manager on the volunteer’s pc takes responsibility of: Download new WUs / upload results; Scheduling tasks and allocating necessary resources (CPU, RAM, HD, …); Many configurations available to volunteer: choice of projects, number of CPUs, when to run, pause/resume, … …CERN desktops can be configured as volunteers’ PCs! BOINC server users volunteers 5th Sep 2016 A.Mereghetti

Motivations for Update Main drivers (BE-ABP): Possibility to run collimation studies on BOINC: Relevant for eLens cleaning simulations; Requirement: online aperture check (work in progress); Extend parameter space (e.g. dp/p, chroma/emittance, collimator scans, …); Contingency (IT infrastructure): AFS work volume giving problems on a daily basis; AFS phasing out and EOS rising (“FS” behind CERNbox); HTCondor coming along (production in 18 months); At the same time: Decouple user space from scripts and templates; When scripts are updated, the users do not have to go through all their workspaces to update all local versions of scripts! Centralize scripts, to ensure that all users receive updates (and use them!); (a bit of) code refactoring (enhance maintainability); Since AFS volume gives problems and IT infrastructure will undergo relevant changes in the near future, present set-up must work! Since ~Dec. 2015 main concern 5th Sep 2016 A.Mereghetti

Issues with AFS work.boinc Volume: Work Flow User’s HOME dir For every WU: fs setacl –dir $spooldir/<studyname> Gunzip/gzip MadX files (WISE); *.zip (all input files); *.desc (boinc-specific settings); Retrieval of results: run_six find $spooldir/<studyname>/results –name … AFS spool dir /afs/cern.ch/work/b/boinc/boinc/ <studyname>/ |_ work/ |_ results/ |_ owner Dispatching results WU submission BOINC server cron job: find $spooldir –maxdepth 2 –type d –name “work” For every WU found, call create_work (actual submission to volunteers PCs…) Bitwise validator, for math stability (local disk); Assimilator: cp results back to AFS spooldir; 5th Sep 2016 A.Mereghetti

Example of Load of BOINC Server (~1h refresh rate); Vertical bars: restart of server where work.boinc volume is mounted; Initial guess: problems are at the submission level: Expiration of AFS tokens  scripts hanging; Too high load of meta-data (e.g. recursive setacl commands); Monitoring of recent activity: problems on the work.boinc volume can also start independently of submission… Mitigation actions taken at every step involving the AFS work.boinc volume, i.e. at job submission / results retrieval, and on user / boinc server side! Horizontal bars: work.boinc volume irresponsive; 5th Sep 2016 A.Mereghetti

Improving the Situation: Changes at Submission User’s HOME dir Submission of WUs: fs setacl limited only to setting up of spooldir (and not at every WU); Gunzip/gzip of MadX files done only ones, i.e. before/after a WISE configuration is used to generate all dependent WUs; Instead of a ‘find’ looking for new WUs, submit through a file list: generate the list of WUs to be submitted and save it to a .txt file; the cronjob (on the BOINC server) parses the .txt file and submit the listed WUs; For every WU: fs setacl –dir $spooldir/<studyname> Gunzip/gzip MadX files (WISE); *.zip (all input files); *.desc (boinc-specific settings); Retrieval of results: run_six find $spooldir/<studyname>/results –name … AFS spool dir /afs/cern.ch/work/b/boinc/boinc/ <studyname>/ |_ work/ |_ results/ |_ owner Dispatching results WU submission BOINC server cron job: find $spooldir –maxdepth 2 –type d –name “work” For every WU found, call create_work Bitwise validator (local disk); Assimilator: cp results back to AFS spooldir; 5th Sep 2016 A.Mereghetti

Improving the Situation: Changes at Returning Data Returning results: Assimilator: introduce a trash-bin mechanism (collab. I. Zacharov): Avoid orphan results; Avoid multiple trials to deliver results to an AFS dir with too many files (AFS load); Retrieval of results from spooldir: simplify find instruction; Use a zip mechanism: Save results locally on BOINC server; zip them according to study; periodically dispatch zipped files to spooldir; User’s HOME dir For every WU: fs setacl –dir $spooldir/<studyname> Gunzip/gzip MadX files (WISE); *.zip (all input files); *.desc (boinc-specific settings); Retrieval of results: run_six find $spooldir/<studyname>/results –name … AFS spool dir /afs/cern.ch/work/b/boinc/boinc/ <studyname>/ |_ work/ |_ results/ |_ owner Dispatching results WU submission BOINC server cron job: find $spooldir –maxdepth 2 –type d –name “work” For every WU found, call create_work Bitwise validator (local disk); Assimilator cp results back to AFS spooldir; 5th Sep 2016 A.Mereghetti

Updating Scripts SixDesk (GitHub repo): |_ sixjobs/ |_ sixutils/ |_ utilities/ |_ boinc_software/ User space + original scripts (3 copies!!) Original scripts + other exes Updated scripts (branch: isolateScripts) Useful BOINC software (branch: add_boinc_soft) utilities/ |_ awk/ |_ bash/ |_ bats/ |_ exes/ |_ fortran/ |_ gnuplot/ |_ perl/ |_ python/ |_ sed/ |_ templates/ |_ tex/ All existing scripts and templates have been collected in a folder separate from the user space (sixjobs/); Scripts are collected based on the coding language; ksh and sh scripts all moved to bash; any (bash) script called by the user on the terminal line automatically detects the full path where it is located and headed it to any sub-script; No removal of ‘old’ stuff, not to break compatibilities at this stage; 5th Sep 2016 A.Mereghetti

Updating Scripts (II) Two scripts already reworked quite deeply: Started with these ones since initial guess: problems are at the submission level: Expiration of AFS tokens  scripts hanging; Too high load of meta-data (e.g. recursive setacl commands); Two scripts already reworked quite deeply: mad6t.sh: generation of fort.2/.8/.16; run_six.sh: generation of fort.3 and submission; Main features: All in bash; Actions / input parameters are treated via getopts (no options triggers help); Instead of doing everything, the user triggers actions: mad6t.sh: generation of input files vs consistency check; run_six.sh: generation of input files vs consistency check vs actual submission; reasons: Reduce running time of a single action, to decrease probability of AFS token expiration; All the code related to a given step is collected in only one script, segmented in functions (e.g. original run_six and called scripts); General refactoring of code: Improve readability; Reduce number of lines; mad6t.sh (329, -33%): run_mad6t (38) + dorun_mad6t (197) + check_mad6t (253); run_six.sh (1624, -25%): run_six (1350) + dot_boinc (269) + dot_bsub (229) + uploadWorkunit (168) + prepare_fort3 (220); 5th Sep 2016 A.Mereghetti

Updating Scripts (III) A common space where to keep the scripts (actually, git clones) has been created in: /afs/cern.ch/project/sixtrack/SixDesk_utilities/ |_ dev/ |_ pro/ |_ old/ Some preliminary (until changes go in release) documentation is available on the SixDesk twiki: Updated scripts (ongoing tests) Empty for the time being 5th Sep 2016 A.Mereghetti

Updating Scripts (IV) Next test: 120 kWUs… mad6t.sh –s Submission to LSF; 60 WISE seeds: 3m for submission, 15m LSF jobs; Next test: 120 kWUs… 60seeds x 4amplis x 5angles 60seeds x 4amplis x 50angles 1.2 kWUs 12 kWUs run_six.sh –g Generation of input (+check) 15m 1h Check: 10% run_six.sh –c Check of input 1.5m 16m ~800 WUs/m run_six.sh –s Actual submission (+check beforehand) 3m 40s 45m ~300 WUs/m Check: 30-40% x4 x10 x12 (crontab job) acrontab_retrieve_boinc_results.sh bash script used as acrontab job (AFS); Based on simple example by E. Mcintosh acrontab.entry; Retrieves results in series (avoid AFS overload of work.boinc volume); Enhanced user interface, i.e. user edits a list of studies; Meant to be run periodically (even if no studies are run), to avoid WUs rejected due to AFS limits; 5th Sep 2016 A.Mereghetti

Conclusions and Outlook SixDesk scripts are being updated: Short term changes are meant to maintain the present set-up reliably working, mitigating issues with AFS work.boinc volume; Limit impact of setacl at submission level; Split submission steps to avoid long processes (expiration of AFS token); Handle orphan results; Improve run_results; Implement submission to BOINC by file list; Implement zipping of results and automatic dispatching back results; Longer term changes are meant to: Include use of SixDB (R. De Maria); Run collimation studies on BOINC; Extend scanning parameters – to be noted: Crossing angle (by D. Pellegrini) to be imported in updated scripts; FCC tunes to be tested (J. Barranco) and then imported in updated scripts; Changes are also required to accommodate future modifications to the IT infrastructure: HTCondor as queue system (collaboration with L. Field); EOS as possible system where spooldirs are located (chosen by user); In parallel: code refactoring (e.g. BNL flag, sussix, …) and documentation; 5th Sep 2016 A.Mereghetti